## Multi-core RISC Processor Design and Implementation (Rev. 2.02)

ELEC5881M - Interim Report

#### Ben David Lancaster

Student ID: 201280376

Submitted in accordance with the requirements for the degree of Master of Science (MSc) in Embedded Systems Engineering

Supervisor: Dr. David Cowell Assessor: Mr David Moore

#### **University of Leeds**

School of Electrical and Electronic Engineering

August 29, 2019

Word count: 4689

#### **Abstract**

This interim report details the 4-month progress on a project to design, implement, and verify, a multicore FPGA RISC processor. The project has been split into two stages: firstly to build a functional single-core RISC processor, and then secondly to add multiprocessor principles and functionality to it.

Current multiprocessor and network-on-chip communication methods have been discussed and how they could be included in this multi-core RISC design. To-date, a 16-bit instruction set architecture has been designed featuring common load/store instructions, comparison, and bitwise operations. A single-core processor has been implemented in Verilog and verified using simulations/test benches running various simple software programs.

Future tasks have been planned and will focus on the second stage of the project. Work will start on designing a loosely coupled multiprocessor communication interface and bringing them to the single-core processor.

## **Revision History**

| Date       | Version | Changes                        |  |
|------------|---------|--------------------------------|--|
| 10/04/2019 | 2.02    | Update future stages.          |  |
| 05/04/2019 | 2.01    | Fix processor RTL diagram.     |  |
| 04/04/2019 | 2.00    | Initial processor RTL diagram. |  |
| 01/04/2019 | 1.00    | Initial section outline.       |  |

Document revisions.

# **Declaration of Academic Integrity**

The candidate confirms that the work submitted is his/her own, except where work which has formed part of jointly-authored publications has been included. The contribution of the candidate and the other authors to this work has been explicitly indicated in the report. The candidate confirms that appropriate credit has been given within the report where reference has been made to the work of others.

This copy has been supplied on the understanding that no quotation from the report may be published without proper acknowledgement. The candidate, however, confirms his/her consent to the University of Leeds copying and distributing all or part of this work in any forms and using third parties, who might be outside the University, to monitor breaches of regulations, to verify whether this work contains plagiarised material, and for quality assurance purposes.

The candidate confirms that the details of any mitigating circumstances have been submitted to the Student Support Office at the School of Electronic and Electrical Engineering, at the University of Leeds.

Name: Ben David Lancaster

Date: August 29, 2019

# **Table of Contents**

| 1  | Intr  | oduction                               | 4  |
|----|-------|----------------------------------------|----|
|    | 1.1   | Why Multi-core?                        | 4  |
|    | 1.2   | Why RISC?                              | 5  |
|    | 1.3   | Why FPGA?                              | 5  |
| 2  | Bacl  | kground                                | 6  |
|    | 2.1   | Amdahl's Law and Parallelism           | 6  |
|    | 2.2   | Loosely and Tightly Coupled Processors | 6  |
|    | 2.3   | Network-on-chip Architectures          | 7  |
| 3  | Proj  | ject Overview                          | 9  |
|    | 3.1   | Project Deliverables                   | 9  |
|    |       | 3.1.1 Core Deliverables (CD)           | 9  |
|    |       | 3.1.2 Extended Deliverables (ED)       | 10 |
|    | 3.2   | Project Timeline                       | 11 |
|    |       | 3.2.1 Project Stages                   | 11 |
|    |       | 3.2.2 Project Stage Detail             | 12 |
|    |       | 3.2.3 Timeline                         | 13 |
|    | 3.3   | Resources                              | 13 |
|    |       | 3.3.1 Hardware Resources               | 13 |
|    |       | 3.3.2 Software Resources               | 14 |
|    | 3.4   | Legal and Ethical Considerations       | 15 |
| 4  | Cur   | rent Progress                          | 16 |
|    | 4.1   | RISC Core                              | 16 |
|    |       | 4.1.1 Instruction Set Architecture     | 16 |
|    |       | 4.1.2 Design and Implementation        | 19 |
|    |       | 4.1.3 Verification                     | 23 |
| 5  | Futu  | are Work                               | 24 |
|    | 5.1   | Project Status                         | 24 |
|    |       | 5.1.1 Updated Project Time Line        | 25 |
|    |       | 5.1.2 Future Work                      | 25 |
| 6  | Con   | aclusion                               | 27 |
| Re | ferer | nces                                   | 28 |
| Aı | peno  | dix A - Code Listing                   | 29 |

## Chapter 1

## Introduction

| 1.1 | Why Multi-core? | 4 |
|-----|-----------------|---|
| 1.2 | Why RISC?       | 5 |
| 1.3 | Why FPGA?       | 5 |

This project will detail the design, implementation, and verification, of a new multi-core RISC processor aimed at FPGA devices. This project was chosen due to my interest in processor design, in which I have only previously designed single-core RISC processors and wish to extend this knowledge to gain a basic understanding of multi-core communication, design considerations, and the limitations of parallelism first hand.

I will use this opportunity to further develop my knowledge of FPGA and processor design by implementing, designing, and verifying, a multi-core RISC processor from scratch, including the design of a communication interface between multiple cores.

## 1.1 Why Multi-core?

Moore's Law states that the number of transistors in a chip will double every 2 years []. CPU designers would utilize the additional transistors to add more pipeline stages in the processor to reduce the propagation delay [] which would allow for higher clock frequencies.

The size of transistors have been decreasing [] and today can be manufactured in sub-10 nanometer range. However, the extremely small transistor size increases electrical leakage and other negative effects resulting in unreliability and potential damage to the transistor []. The high transistor count produces large amounts of heat and requires increasing power to supply the chip. These trade-offs are currently managed by reducing the input voltage, utilising complex cooling techniques, and reducing clock frequency. These factors limit the performance of the chip significantly. These are contributing factors to Moore's Law *slowing* down. The capacity limit of the current-generation planar transistors is approaching and so in order for performance increases to continue, other approaches such as alternate transistor technologies like Multigate transistors [1], software and hardware optimisations, and multiprocessor architectures are employed.

This report will focus on the latter: to produce a small multi-core processor that can utilise software-based parallelism to gain performance benefits, compared to a larger single-core design.

### 1.2 Why RISC?

RISC architectures feature simpler and fewer instructions compared to CISC, which emphasises instructions that perform larger tasks. A single CISC instruction might be performed with multiple RISC instructions. Because of the fewer and simpler instructions, RISC machines rely heavily on software optimisations for performance. RISC instruction sets are based on load/store architectures, where most instructions are either register-to-register or memory reading and writing [? ]. This constraint greatly reduces complexity.

RISC architectures are easier to design implement, especially for beginners, due to their simpler instructions that share the same pipeline, compared to CISC where there may be different pipeline for each instruction, which would greatly consume FPGA resources.

## 1.3 Why FPGA?

Field programmable gate arrays (FPGA) are a great choice for prototyping digital logic designs due to their programmable nature and quick development times.

My previous experience with FPGAs in previous projects will reduce risk and learning times and allow for more time to be spent on adding and extending features (discusses further in section 3.1).

FPGAs, however, may not be suitable for prototyping all register-transistor logic (RTL) projects. Larger RTL projects, such as large commercial processors, may greatly exceed the logic cell resources available in today's high-end FPGA devices and may only be prototyped through silicon fabrication, which can be expensive. This resource limitation will not be problem as the project aims to produce a small and minimal design specifically for learning about multi-core architectures.

## Chapter 2

# **Background**

| 2.1 | Amdahl's Law and Parallelism           | 6 |
|-----|----------------------------------------|---|
| 2.2 | Loosely and Tightly Coupled Processors | 6 |
| 2.3 | Network-on-chip Architectures          | 7 |

#### 2.1 Amdahl's Law and Parallelism

In many applications, not restricted to software, there may exists many opportunities for processes or algorithms to be performed in parallel. These algorithms can be split into two parts: a serial part that cannot be parallised, and a part that can be parallelised. Amdahl's Law defines a formula for calculating the maximum *speedup* of a process with potential parallelism opportunities when ran in parallel with n many processors. Speedup is a term used to describe the potential performance improvements of an algorithm using an enhanced resource (in this case, adding parallel processors) compared to the original algorithm. Amdalh's Law is defined below, where the potential speedup  $S_p$  is dependant on the portion of program that can be parallelised p and the number of processing cores n:

$$S_p = \frac{1}{(1-p) + \frac{p}{n}} \tag{2.1}$$

This formula will be used throughout the project to gauge the the performance of the multi-core design running various software algorithms.

### 2.2 Loosely and Tightly Coupled Processors

Multiprocessor systems can be generalised into two architectures: loosely and tightly coupled, and each architecture has advantages and disadvantages. In loosely coupled systems, each processing node is self-contained – each node has it's own dedicated memory and IO modules. Communication between nodes is performed over a *Message Transfer System (MTS)* [? ] in a master-slave control architecture.

Scalability in loosely coupled systems is generally easier to implement as each node can simply be appended to the shared MTS interface without large modifications to the rest of the system. Scalability is an important concern in this project as I wish to test the developed solution with a range of processing nodes.

As loosely coupled system's nodes feature there own memory and IO modules, they generally perform better in cases where interaction between nodes is not prominent – each node can store a separate part of the software program in it's memory module allowing simultaneous executing of the program.

In scenarios where inter-node communication is prominent however, access to the MTS interface must be scheduled to avoid access conflicts which introduces delays and idle times in the software programs execution, resulting in lower throughput. Figure 2.1 shows a general layout of a loosely coupled multiprocessor system.

Tightly coupled systems feature processing nodes that do not have their own dedicated memory or IO modules – each node is directly connected to a shared memory module using a dedicated port. In scenarios where inter-node communication is prominent, tightly coupled systems are generally better suited as nodes are directly connected to a shared memory and do not need to wait to use a shared bus.



**Figure 2.1:** A loosely coupled multiprocessor system. Each node features it's own memory and IO modules and uses a Message Transfer System to perform inter-node communication. Image source: [?].

**Figure 2.2:** A tightly coupled multiprocessor system. Nodes are directly connected to memory and IO modules. Image source: [?

This project will utilise a loosely coupled architecture due to it's easier scalability implementation and my previous experience with the design of single-core processors. Although it will require a scheduler to access the MTS, the experience and knowledge gained from this task will be greatly beneficial for future projects.

### 2.3 Network-on-chip Architectures

Network-on-chip (NoC) architectures implement on-chip communication mechanisms that are based on network communication principles, such as routing, switching, and massive scalability [?]. NoC's can generally support hundreds to millions of processing cores. Figure 2.3 shows an example 16-core network-on-chip architecture. NoC's can scale to very large sizes while not sacrificing performance because each processor core is able to drive the network rather than needing to wait for a shared bus to become free before doing so.

The greater the number of cores in a network-on-chip design, the greater quality of service (QoS) problems arise. As such, network-on-chip architectures suffer the same problems as networks, such as fairness and throughput [?].



Figure 2.3: A multiprocessor network-on-chip architecture with 16 processing nodes. Nodes are connected in a grid formation with routers and links. Image source: [?].

## Chapter 3

# **Project Overview**

| 3.1 | Projec | et Deliverables            | 9  |
|-----|--------|----------------------------|----|
|     | 3.1.1  | Core Deliverables (CD)     | 9  |
|     | 3.1.2  | Extended Deliverables (ED) | 1( |
| 3.2 | Projec | et Timeline                | 11 |
|     | 3.2.1  | Project Stages             | 11 |
|     | 3.2.2  | Project Stage Detail       | 12 |
|     | 3.2.3  | Timeline                   | 13 |
| 3.3 | Resou  | trces                      | 13 |
|     | 3.3.1  | Hardware Resources         | 13 |
|     | 3.3.2  | Software Resources         | 14 |
| 3.4 | Legal  | and Ethical Considerations | 15 |

This chapter discusses the the project's requirements, goals, and structure.

### 3.1 Project Deliverables

The project's deliverables are split into two sections: core deliverables (CD) – each deliverable must be satisfied for the project to be a minimum viable product (MVP), and extended deliverables (ED) – deliverables that are not required for a MVP – features that only improve upon an existing feature.

#### 3.1.1 Core Deliverables (CD)

The project's core deliverables are described below.

#### CD1 Design a compact 16-bit RISC instruction set architecture.

The instruction set will be the primary interface to control the processor from software. An instruction set will be required to implement the custom multi-core communication interface.

It was decided to design a new instruction set rather than to extend an existing architecture as this will increase my knowledge of the constraints to consider when designing instruction sets and processors.

#### CD2 Design and implement a Verilog RISC core that implements the ISA in CD1.

The Verilog RISC core will be able to run software program written for the instruction set architecture.

# CD3 Design and implement an on-chip interconnect for multi-core processing (2 to 32 cores) using the RISC core from CD2.

The interconnect will be a chief requirement to enable multi-core communication. The interconnect should support up to 32 cores, however FPGA implementation constraints may limit this due to limited resources.

The interconnect will control communication between the cores to enable software parallelism.

# CD4 Analyse performance of serial and parallel software algorithms, such as parallel DFT, on the processor.

To evaluate the effectiveness of the developed solution, a serial and parallel implementation of a simple computing algorithm (parallel reduction, sorting) will be ran on the processor and it's performance analysed. Effectiveness will be rated on total algorithm run-time and the speed-up gained by adding more cores.

#### CD5 Allow the RISC core to be easily compiled to multiple FPGA vendors (Xilinx, Altera).

The developed solution should be generic and portable to allow it to be used across a widerange of FPGA vendors and devices.

Verilog is a generic implementation-independent hardware-description language and so designing implementation specific modules is recommended.

A key consideration for this requirement is to consider the varying hard IP provided by the FPGA vendors (such as BRAM, ethernet, and PCIe [??]). To overcome this problem, the developed Verilog code will conditionally compile where vendor specific requirements are present.

#### 3.1.2 Extended Deliverables (ED)

The project's extended deliverables are described below.

- **ED1** Design a RISC core with an instructions-per-clock (IPC) rating of at least 1.0 (a single-cycle CPU).
- **ED2** Design a RISC core with a pipe-lined data path to increase the design's clock speed.
- **ED3** Design a scalable multi-core interconnect supporting arbitrary (more than 32) RISC core instances (manycore) using Network-on-Chip (NoC) architecture.
- **ED4** Design a compiler-backend for the PRCO304 [?] compiler to support the ISA from 1 CD1. This will make it easier to build complex multi-core software for the processor.
- **ED5** The RISC core can communicate to peripherals via a memory-mapped addresses using the Wishbone bus.
- **ED6** Implement various memory-mapped peripherals such as UART, GPIO, LCD, to aid visual representation of the processor during the demonstration viva.
- ED7 Store instruction memory in SPI flash.
- ED8 Reprogram instruction memory at runtime from host computer.
- **ED9** Processor external debugger using host-processor link.

## 3.2 Project Timeline

### 3.2.1 Project Stages

The project is split up into many stages to aid planning and management of the project. There are 8 unique stage areas: 1. Inital project conception; 2 Basic RISC core development; 3. Extended RISC core development; 4. Multi-core development; 5. Processor quality-of-life (QoL) improvements; 6. Compiler development; 7. Demo preparation, and 8. Final report.

The project stages are shown in Table 3.1.

| Stage | Title                                        | Start Date | Days | Core | Applicable Deliverables |
|-------|----------------------------------------------|------------|------|------|-------------------------|
| 1.0   | Research                                     | Feb 04     | 7    | x    |                         |
| 1.1   | Requirement gathering/review                 | Feb 11     | 14   | х    |                         |
| 1.1   | Processor specification, architecture, ISA   | Feb 18     | 100  | х    | CD1                     |
| 1.2   | Stage/Time Allocation Planning               | Feb 25     | 7    | х    |                         |
| 2.1   | Decoder, Register Set, impl & integration    | Feb 25     | 14   | x    | CD2                     |
| 2.2   | Register set impl & integration              | Mar 04     | 14   | x    | CD2                     |
| 2.3   | Local memory impl & integration              | Mar 11     | 14   | х    | CD2                     |
| 3.1   | Memory mapped register layout & impl         | Apr 01     | 21   |      | ED5                     |
| 3.2   | Wishbone peripheral bus connected to MMU     | Apr 08     | 21   |      | ED5                     |
| 3.3   | Pipelined implementation and verification    | Apr 15     | 21   |      | ED2                     |
| 3.4   | Cache memory design & impl                   | Apr 22     | 28   |      | ED2                     |
| 4.1   | Multi-core communication interface           | TBD        | TBD  | x    | CD3                     |
| 4.2   | Shared-memory controller                     | TBD        | TBD  | х    | CD3                     |
| 4.3   | Scalable multi-core interface (10s of cores) | TBD        | TBD  | х    | CD3                     |
| 4.4   | Multi-core example program (reduction)       | TBD        | TBD  | х    | CD4                     |
| 5.1   | SPI-FPGA interface for OTG programming       | TBD        | TBD  |      | ED7                     |
| 5.2   | FPGA-PC interfacing                          | TBD        | TBD  |      | ED9                     |
| 5.3   | FPGA-PC debugging (instruction breakpoints)  | TBD        | TBD  |      | ED9                     |
| 6.1   | Compiler backend for vmicro16                | TBD        | TBD  |      | ED4                     |
| 6.2   | Compiler support for multi-core codegen      | TBD        | TBD  |      | ED4                     |
| 7.1   | Wishbone peripherals for demo                | TBD        | TBD  | x    | CD4                     |
| 8.1   | Final Report                                 | TBD        | TBD  | х    |                         |

 Table 3.1: Project stages throughout the life cycle of the project.

#### 3.2.2 Project Stage Detail

#### Stages 1.0 through 1.2 - Research and Project Conception

These stages cover initial research of existing problems and solutions in the multiprocessor area. The instruction set architecture is also proposed that later stages will implement.

#### Stages 2.1 through 2.3 - Processor module Design, Implementation, and Integration

These stages cover the design, implementation, and integration of key processor core modules such as the instruction decoder, register sets and local memory. Integration of all the modules is a challenging task because some modules have both asynchronous and synchronous signals that need to be timed correctly in order for other modules to receive valid data. An example of this is the register set which has asynchronous read ports that are later clocked in the instruction decode stage.

#### Stages 3.1 through 3.4 – Advanced Processor Implementation

These stages add advanced features to the processor to provide a more functional product. Although these stages are classified as extended, their technical requirement to design and implement is not great and so are have time allocations in the project schedule. The extended features that these stages introduce are: pipelined processor stages – to drastically increase processor performance; provide a memory-mapped peripheral interface through the MMU; provide a Wishbone master interface to the MMU – allowing external peripherals such as GPIO and LCD displays to be utilised in a modular fashion; and to implement a cache memory for each processor core.

#### Stages 4.1 through 4.4 - Multiprocessor Functionality

These stages are dedicated to adding multiprocessor functionality using a loosely coupled architecture to the processor.

#### Stages 5.1 through 5.3 - Debugging Features

These stages cover debugging features and are classified as extended due to the large development time required to implement them as well as not being related to multiprocessor systems.

#### Stages 6.1 through 6.2 - Compiler Backends

These stages cover the implementation of a compiler backend to ease software writing and programming of the processor.

#### Stage 7.1 – Wishbone Peripherals

Additional Wishbone peripherals, such as SPI and timers will be added to produce a more useful multiprocessor system.

#### Stage 8.1 – Final Report

This stage is dedicated to the final report write-up. It is expected to be an iterative task that is active throughout the lifespan of the project.

#### 3.2.3 Timeline

The project stages from Table 3.1 are displayed below in a Gantt chart.



Figure 3.1: Project stages in a Gantt chart.

#### 3.3 Resources

This section describes the hardware and software resources required to fulfil the project.

#### 3.3.1 Hardware Resources

Core deliverable CD5 requires the designed RISC core to be implemented and demonstrated on multiple FPGA devices. Although my design should synthesise for physical IC implementation, due to high costs and lengthy production times, it is not a primary development target. Due to having past experience with Xilinx FPGAs from my placement work and experience with Altera from university modules it was decided to target the Xilinx Spartan 6 XC6SLX9 and the Altera Cyclone V.

#### Terasic DE1-SoC Development Board

The Terasic DE1-SoC development board features a large Cyclone V FPGA and many peripherals, such as seven-segment displays, 64 MB SDRAM, ADCs, and buttons and switches, which will aid demonstration of the project. The development board is available through the university so the cost is negligible. Figure 3.2 shows the peripherals (green) available to the FPGA.

#### Minispartan 6+ FPGA Development Board

The Minispartan 6+ is a hobbyist FGPA development board with fewer peripherals than the DE1-SoC. The board features a Xilinx Spartan 6 XC6LX9 which has far fewer resources than the DE1-



Figure 3.2: Terasic DE1-SoC development board featuring the Altera Cyclone V FPGA and many peripherals. Image source: [2].

SoC's Cyclone V however it's simplicity and my familiarity with Xilinx's software suite will speed up development. The development board is shown in Figure 3.3.



Figure 3.3: Minispartan-6+ development board featuring the Xilinx Spartan 6 XC6SLX9. Note that the XC6SLX9 and XC6SLX25 FPGAs share the same board. Image source: [3].

#### 3.3.2 Software Resources

#### **Intel Quartus**

Intel Quartus Prime is a paid-for SoC, CPLD, and FPGA software suite targeting Intel's Stratix, Arria, and Cyclone based FPGAs. The university provides student licences which will be used via VPN.

#### Xilinx ISE Webpack

Xilinx ISE Webkpack is Xilinx's free software suite for FPGA development for Spartan 6 based FPGAs. Due to ISE's intuitive and fast work flow, most of the initial simulation and verification processes will be performed using ISE. This will greatly improve development times.

#### Verilator

Verilator is an open-source Verilog to C++ transpiler which provides a C++ interface to simulate Verilog modules and read/write values similar to a test bench. Verilator will be used for specific modules within the RISC core such as the ALU and decoder as Verilator is useful when performing exhaustive verification.

### 3.4 Legal and Ethical Considerations

The RISC core is designed to be used as an academic research and educational tool to aid learning and understanding of RISC and multi-core machines. It should not be use for roles where mission critical or safety is a factor.

The processor does not provide any memory protection features and any software running on the processor has full access to all memory.

The processor does not store/track/predict software instructions. The processor uses pipelining techniques to improve performance which results in future instructions entering the pipeline even if the software's logical sequence does not include these instructions. This could result in security vulnerabilities similar to Intel's Spectre vulnerability [4].

## Chapter 4

# **Current Progress**

| 4.1 | RISC Core |                              |
|-----|-----------|------------------------------|
|     | 4.1.1     | Instruction Set Architecture |
|     | 4.1.2     | Design and Implementation    |
|     | 4.1.3     | Verification                 |

This chapter discusses the current progress made towards the project, including designs, implementation, and current results.

#### 4.1 RISC Core

Following the project time line described in section 3.2, the first couple months have been dedicated to the design and implementation of the instruction set architecture and RISC core with stages 1-3. Good progress has been made in both deliverables, the ISA and the RISC core, and the progress is on-time with the initial project time line. The core has been nicknamed *Vmicro16* – short for Verilog microprocessor 16-bit.

#### 4.1.1 Instruction Set Architecture

A 16-bit instruction set architecture (ISA) has been designed using an iterative approach. There currently exists 32 unique instructions covering most generic RISC operations (add, load/store, branch, compare, etc.) and atleast 16 opcodes available to be provide multi-core communication and functionality. This number should be adequate to support these features when the work begins on the multi-core project stages (stages 4-7).

#### **Design Goals**

Having past experience designing and implementing ISAs for previous projects, I wanted to use that knowledge to design an even more efficient and compact instruction set that could provide much greater functionality. The technical design goals of the ISA are described below:

#### ISA1 Use a fixed width of 16-bits for all instructions.

This will significantly reduce RTL resources and encourage efficiency by not wasting spare bits. In addition, many SPI flash and RAMs support 16-bit wide data reads which will allow each instruction fetch to only require one clock cycle, thus increasing processor performance.

#### ISA2 Be able to select at least two registers for common instructions.

This will reduce the number of required instructions to manipulate register data. A disadvantage of using two instead of three reigster selects is that instructions are always destructive – they always *destroy* existing data in the destination register (e.g. R0 = ADD R0 R1) unlike constructive instructions that provide a unique register select for the destination (e.g. R2 = ADD R0 R1).

#### ISA3 Reduce bit-space for frequently used instructions (MOV, MOVI, ADD).

Due to the 16-bit limit, two register selects, and immediate values, the opcode bits are reduced resulting in fewer unique instructions. To overcome this constraint, spare bits in other instructions will be appended to the opcode bits to extend the opcode range. This however, will require a more complex decoder that must first switch the opcode, then switch any spare bits to determine the final opcode. This method will significantly increase the number of unique instructions provided by the instruction set.

#### ISA4 Provide frequently used actions as options for existing instructions.

In software, frequently used actions include incrementing/decrementing by 1 and performing logical comparisons which usually take more than one instruction on some RISC architectures. As they are common actions, the instruction overhead and time may be significant and can affect performance. To provide a solution to this problem, in addition to using spare bits to extend the opcode range, spare bits will be used to signify a frequently used action action to be performed by the ALU.

As shown in Figure 4.1, frequently used commands such as incrementing/decrementing and logical comparions are provided by setting spare bits to special values. For example, the instructions ARITH\_UADDI and ARITH\_SSUBI extend the ARITH\_U and ARITH\_S opcodes by filling the spare bit, 4. If this bit is not set (0), the instruction allows for a 4-bit immediate value to be added in addition to the two register selects. The 4-bit immediate allows adding a small number to the ALU which is useful in the case of software for loops where an increment/decrement of more than 1 is required.

Another example is the SETC instruction. Inspired by Intel's x86 SETCC, the instructions sets the destination register to zero or one depending on the result of the CMP instruction's flags. Without this instruction, multiple branches would be required to convert the comparion's flags to logical zeros and ones.

#### ISA5 Provide instructions for performing bitwise manipulations.

RISC processors are commonly used for microprocessing and microcontroller actions which typically includes bit manipulation. The ISA provides bitwise OR, XOR, AND, NOT, and shifting instructions under a single opcode to fill this need.

#### ISA6 Provide instructions for explicitly performing signed and unsigned arithmetic.

Performing signed and unsigned arithmetic is a key requirement for RISC applications and so it was decided to provide such instructions. Software programmers can easily switch between signed and unsigned arithmetic by setting bit 11 in the ARITH instruction family. Being able to change between signed and unsigned arithmetic instructions by changing a single bit will make the RISC processor's decoder module smaller and less complex.

Without explicit unsigned and signed instructions, extra instructions would be required to perform addition and subtraction. In addition, due to two's complement representation of

signed numbers, the highest immediate operand value would be halved, resulting in more instructions to reach the desired value.

|             | 15-11 | 10-8  | 7-5  | 4-0   | rd ra simm5               |  |
|-------------|-------|-------|------|-------|---------------------------|--|
|             | 15-11 | 10-8  | 7-0  |       | rd imm8                   |  |
|             | 15-11 | 10-0  |      |       | nop                       |  |
|             | 15    | 14:12 | 11:0 |       | extended immediate        |  |
| NOP         | 00000 |       | X    | 20:   |                           |  |
| LW          | 00001 | Rd    | Ra   | s5    | Rd <= RAM[Ra+s5]          |  |
| SW          | 00010 | Rd    | Ra   | s5    | RAM[Ra+s5] <= Rd          |  |
| BIT         | 00011 | Rd    | Ra   | s5    | bitwise operations        |  |
| BIT_OR      | 00011 | Rd    | Ra   | 00000 | Rd <= Rd   Ra             |  |
| BIT_XOR     | 00011 | Rd    | Ra   | 00001 | Rd <= Rd ^ Ra             |  |
| BIT_AND     | 00011 | Rd    | Ra   | 00010 | Rd <= Rd & Ra             |  |
| BIT_NOT     | 00011 | Rd    | Ra   | 00011 | Rd <= ~Ra                 |  |
| BIT_LSHFT   | 00011 | Rd    | Ra   | 00100 | Rd <= Rd << Ra            |  |
| BIT_RSHFT   | 00011 | Rd    | Ra   | 00101 | Rd <= Rd >> Ra            |  |
| MOV         | 00100 | Rd    | Ra   | X     | Rd <= Ra                  |  |
| MOVI        | 00101 | Rd    | i    | 8     | Rd <= i8                  |  |
| ARITH_U     | 00110 | Rd    | Ra   | s5    | unsigned arithmetic       |  |
| ARITH_UADD  | 00110 | Rd    | Ra   | 11111 | Rd <= uRd + uRa           |  |
| ARITH_USUB  | 00110 | Rd    | Ra   | 10000 | Rd <= uRd - uRa           |  |
| ARITH_UADDI | 00110 | Rd    | Ra   | OAAAA | Rd <= uRd + Ra + AAAA     |  |
| ARITH_S     | 00111 | Rd    | Ra   | s5    | signed arithmetic         |  |
| ARITH_SADD  | 00111 | Rd    | Ra   | 11111 | Rd <= sRd + sRa           |  |
| ARITH_SSUB  | 00111 | Rd    | Ra   | 10000 | Rd <= sRd - sRa           |  |
| ARITH_SSUBI | 00111 | Rd    | Ra   | 0AAAA | Rd <= sRd - sRa + AAAA    |  |
| BR          | 01000 | Rd    | i    | 8     | conditional branch        |  |
| BR_U        | 01000 | Rd    | 0000 | 0000  | Any                       |  |
| BR_E        | 01000 | Rd    | 0000 | 0001  | Z=1                       |  |
| BR_NE       | 01000 | Rd    | 0000 | 0010  | Z=0                       |  |
| BR_G        | 01000 | Rd    | 0000 | 0011  | Z=0 and S=0               |  |
| BR_GE       | 01000 | Rd    | 0000 | 0100  | S=0                       |  |
| BR_L        | 01000 | Rd    | 0000 | 0101  | S != O                    |  |
| BR_LE       | 01000 | Rd    | 0000 | 0110  | Z=1 or (S != O)           |  |
| BR_S        | 01000 | Rd    | 0000 | 0111  | S=1                       |  |
| BR_NS       | 01000 | Rd    | 0000 | 1000  | S=0                       |  |
| CMP         | 01001 | Rd    | Ra   | X     | SZO <= CMP(Rd, Ra)        |  |
| SETC        | 01010 | Rd    | Ra   | X     | Rd <= Imm8 == SZO ? 1 : 0 |  |
| MOVI_LARGE  | 1     | Rd    | i12  | X17   | Rd <= i12                 |  |

Figure 4.1: Initial Vmicro16 16-bit instruction set architecture. Coloured regions represent instruction families (bitwise, branching, arithmetic, etc.).

The ISA table is shown in Figure 4.1. The top 5 bits (15-11) are dedicated to the opcode resulting in 32 unique values. Currently only the bits 14-11 are used (NOP to SETC) leaving the top bit spare. Initially, this bit was reserved to indicate an extended immediate instruction, MOVI12, supporting a large 12-bit immediate value, however later in the design it was decided that the top bit would indicate special instructions dedicated for multi-core operation. This leaves 16 spare unique opcodes for this purpose.

#### 4.1.2 Design and Implementation

The RISC core design is a traditional 5-stage processor (fetch, decode, execute, memory, write-back).

To satisfy CD5, the Verilog code will be self-contained in a single file. This reduces the hierarchical complexity and eases cross-vendor project set-up as only a single file is required to be included. A disadvantage with this single file approach is that some external Verilog verification tools that I plan to use, such as Verilator, do not currently support multiple Verilog modules (due to an unfixed bug) within a single file.



Figure 4.2: Vmicro16 RISC 5-stage RTL diagram showing: instruction pipelining (data passed forward through clocked register banks at each stage); branch address calculation; ALU operand calculation (rd2 or imm); and program counter incrementing.

#### **Instruction and Data Memory**

The design uses separate instruction and data memories similar to a Harvard architecture computer. This architecture was chosen due because I find it easier to implement.

#### Register File

To support design goal ISA2, the register set features a dual-port read and single-port write. This allows instructions to read 2 registers simultaneously for any instruction. The single-port write allows the instruction output to be written to the register file.

#### **Pipelining**

The extended deliverable **ED1**, to provide atleast 1 instructions per clock. Previous processor designs of mine have all required multiple clocks per instruction as it is a lot easier to implement. Modern processors today can output 1 or more instructions per clock through the use of instruction pipelining. This technique increases throughput of the processor by performing each stage in parallel. In this pipeline, instructions still travel through each stage in the same order, the difference is that the fetch stage does not wait for the final stage to complete and so fetches a new instruction every clock cycle, resulting in each stage operating on new data every clock cycle. To extend my knowledge in CPU pipelining, extended deliverable **ED1** is proposed.

Instruction pipelining is harder to implement as data and control hazards can occur. Data hazards occur when instructions are dependent on the output of a previous instruction that has not left the pipeline, for example a register dependency. Methods to detect this hazard include checking if the register selects in the decode stage are present in future stages of the pipeline. If this check is true, then the current instruction depends on an instruction in the pipeline, and the processor can either wait until the dependant instruction has left the pipeline (i.e. has been written back to registers) or insert a NOP that will produce a *bubble* in the pipeline allowing the final stage to execute before the dependant instruction continues.

Control hazards occur when conditional or interrupt branching instructions are in the pipeline and their result has not been calculated yet. This results in preceding instructions entering the pipeline when they should not be executed due to the conditional branch. To detect this hazard, for instructions that perform branching or conditional execution, a global flag is set. When the outcome of the conditional check is performed, stages after decode are allowed to commit their results. Fortunately this technique is fairly simple implement.

This project's RISC processor implements these two hazard detectors and solutions to resolve them. The data hazard resolver implements a valid signal that is passed forward from stage to stage. This signal is low when a hazard has occured and indicates that receiving stage should not operate on the previous stage's data. Each stage's valid signal is dependant on the previous stages valid signal. This allows future stages to stall when a hazard is detected in previous stages. A diagram of the implementation of these hazards in the processor is shown in Figure 4.3.

#### Memory Management Unit

It was decided to use a memory management unit (MMU) to make it easier and extensible to communicate with external peripherals or additional registers. This method would transparently use the



Figure 4.3: Pipeline data hazard detection. The register selects are passed forward through each stage and compared to the IDEX (latest instruction) register selects. If they match, the latest instruction depends on the output of an instruction in the pipeline, the IFID and IDEX stages are stalled to allow the instruction in the pipeline to commit.

existing LW/SW instructions which removes the requirement for a unique instruction for each peripheral.

#### **Proposed Memory Mapped Addresses**

The peripheral addresses are currently based on classes. For example, a memory-mapped address may use the upper byte to address a peripheral and the lower byte to address a register/function in that peripheral.

Later in the project, I plan to rewrite the addressing scheme to use a simpler address format which is closer to commonly used peripheral addressing schemes used today. The proposed memory mapped addresses for each system and peripheral are listed below.

| Address (16-bit aligned) | Peripheral Name                                                                |
|--------------------------|--------------------------------------------------------------------------------|
| 0x0000                   | NOP (reads returns 0, writes do nothing)                                       |
| 0x00ZZ                   | Per-core scratch RAM (ZZ = 8-bit RAM address)                                  |
| 0x0100                   | Extended Core Registers 1                                                      |
| 0x0200                   | Extended Core Registers 2                                                      |
| 0x03ZZ                   | Wishbone Master controller select (ZZ contains 8-bit wishbone slave address)   |
| 0x1XYZ                   | Master core controller ( $X = $ slave select, $Y = $ instruction, $Z = $ data) |

Table 4.1: Provisional memory-mapped addresses table.

#### **ALU Design**

The Vmicro16's ALU is an asynchronous module that has 3 inputs: data a; data b; and opcode op, and outputs data value c. The ALU is able to operate on both register data (rd1 and rd2) and immediate values. A switch is used to set the b input to either the rd2 or imm value from the previous stage.

Currently, the ALU does not store flags to indicate overflow, equality, or zero values in the module itself. Instead the ALU outputs the result of the CMP, which calculates such flags, to be written back to the register set in the write-back stage. This means that in order to perform a conditional operation, such as a branch, the register containing the CMP flags must be included in the instruction.



Figure 4.4: Vmicro16 ALU diagram showing clocked inputs from the previous IDEX stage being

The Verilog implementation of the ALU is shown in Figure 4.5. The ALU's asynchronous output is clocked with other registers, such as destination register rs1 and other control signals, in the EXME register bank.

```
322
                           end
323
                           MMU_STATE_T3: begin
324
325
                                // Slave has output a ready signal (finished)
                                M_PENABLE <= 0;
326
                                M PADDR. <= 0:
327
                                M_PWDATA <= 0;
328
                                M_PSELx <= 0;
329
                                M_PWRITE <= 0;
330
                                // Clock the peripheral output into a reg,
331
                                     to output on the next clock cycle
332
                                per_out <= M_PRDATA;</pre>
333
334
                                mmu_state <= MMU_STATE_T1;</pre>
335
```

Figure 4.5: Vmicro16's ALU implementation named vmicro16\_alu. vmicro16.v

#### **Decoder Design**

Instruction decoding occurs in the between the IFID and IDEX stages. The decoder extracts register selects and operands from the input instruction. The decoder outputs are asynchronous which allows the register selects to be passed to the register set and register data to be read asynchronously. The register selects and register read data is then clocked into the IDEX register bank.

```
224
         // more luts than below but easier
225
         //wire tim0_en = (mmu_addr >= `DEF_MMU_TIM0_S)
226
                      && (mmu_addr <= `DEF_MMU_TIMO_E);
227
         //wire sreg_en = (mmu_addr >= `DEF_MMU_SREG_S)
228
                      && (mmu_addr <= `DEF_MMU_SREG_E);</pre>
229
         //wire intv_en = (mmu_addr >= `DEF_MMU_INTSV_S)
230
                      && (mmu_addr <= `DEF_MMU_INTSV_E);</pre>
231
         //wire intm_en = (mmu_addr >= `DEF_MMU_INTSM_S)
232
                       && (mmu_addr <= `DEF_MMU_INTSM_E);</pre>
233
234
         wire tim0_en = ~mmu_addr[12] && ~mmu_addr[9] && ~mmu_addr[7];
235
         wire sreg_en = mmu_addr[7] && ~mmu_addr[4] && ~mmu_addr[5];
236
         wire intv_en = mmu_addr[8] && ~mmu_addr[3];
237
         wire intm_en = mmu_addr[8] && mmu_addr[3];
238
                      = !(|{tim0_en, sreg_en, intv_en, intm_en});
         wire apb_en
240
241
         wire tim0_we = (tim0_en && mmu_we);
         wire intv_we = (intv_en && mmu_we);
242
         wire intm_we = (intm_en && mmu_we);
243
244
         // Special register selects
245
```

Figure 4.6: Vmicro16's decoder module code showing nested bit switches to determine the intended opcode. vmicro16.v

In Figure 4.6, it can be seen that the first 8 opcode cases are represented using the same 15-11 bits, however the VMICRO16\_OP\_BIT instructions require another bit range to be compared to determine the output opcode.

#### 4.1.3 Verification

Currently, the only verification method used is manual inspection of the output waveforms of a test bench. For now, it is easier and faster to spot erroneous states by hand due to the large complexity of the pipeline. Later in the project, automatic test benches will be utilised.

#### **Known Bugs**

Known bugs exist within the RISC core however none are critical as they can be easily avoided in software.

#### BUG1 Stall detection does not consider load/store instructions.

Due to instruction pipelining techniques used by the processor and lack of address checking in the EXME and MEWB stages, LW instructions immediately after SW instructions:

```
SW RO (R2+16)
LW R1 (R2+16)
```

will not return the previously stored value. In addition, because of the target address is calculated by the ALU (e.g. R2+16), detecting matching addresses at IFID and IDEX stage is not trivial, and because of this, a hardware fix is not planned for the final version. It is possible to overcome this problem in software by placing at least 5 NOP instructions after each SW.

## Chapter 5

## **Future Work**

| 5.1 | Projec | t Status                  | 24 |
|-----|--------|---------------------------|----|
|     | 5.1.1  | Updated Project Time Line | 25 |
|     | 5.1.2  | Future Work               | 25 |

### 5.1 Project Status

Four months have passed since the start of the project and significant progress has been made to the final deliverable.



Figure 5.1: Caption for BRAMex

The current active stage is 3.3 *Pipeline Implementation and Verification* where the processor pipeline is being verified against of range of simple software sequences. It is important that this verification is thorough and the output is bug free as future additions to the processor will utilise this foundation.

#### 5.1.1 Updated Project Time Line

The project table described in section 3.2 did not allocate times for stages 4.1 and later. This was due to expected high demand from other modules and exams in this time period and so it was decided to not allocate times that would later not be followed.

Now that this time period is closer, time allocations have been assigned for stages 4, 7, and 8. The state of stage 5's extended deliverables, to implement debugging interfaces, have changed from *Unknown* to *Cancelled* due to expected high workload from other modules in the next month. The cancellation of these stages will not severely affect the final functionality of the deliverable however it will make debugging the processor slightly more difficult. It was decided to remove these extended features to allow for more time to be spent on core functionality.

The updated project status is shown in Table 5.1 and in Figure 5.2.

#### 5.1.2 Future Work

May and early June are reserved for work on other modules and preparation for exams. From mid-June, work will resume on verifying the end of stage 3 and then work will start on stage 4 (focussed on designing and implementing multiprocessor features). After stage 4, software algorithms will be compiled for the ISA and evaluated against Amdahl's Law.



Figure 5.2: Updated project time gantt chart showing time allocations for stage 4.

| Stage | Title                                        | Start Date | Core | Status    |
|-------|----------------------------------------------|------------|------|-----------|
| 1.0   | Research                                     | Feb 04     | x    | Completed |
| 1.1   | Requirement gathering/review                 | Feb 11     | x    | Completed |
| 1.1   | Processor specification, architecture, ISA   | Feb 18     | x    | Completed |
| 1.2   | Stage/Time Allocation Planning               | Feb 25     | х    | Completed |
| 2.1   | Decoder, Register Set, impl & integration    | Feb 25     | x    | Completed |
| 2.2   | Register set impl & integration              | Mar 04     | х    | Completed |
| 2.3   | Local memory impl & integration              | Mar 11     | х    | Completed |
| 3.1   | Memory mapped register layout & impl         | Apr 01     |      | On-going  |
| 3.2   | Wishbone peripheral bus connected to MMU     | Apr 08     |      | On-going  |
| 3.3   | Pipeline implementation and verification     | Apr 15     |      | On-going  |
| 3.4   | Cache memory design & impl                   | Apr 22     |      | Cancelled |
| 4.1   | Multi-core communication interface           | Jun 05     | x    | Planned   |
| 4.2   | Shared-memory controller                     | Jun 05     | x    | Planned   |
| 4.3   | Scalable multi-core interface (10s of cores) | Jul 01     | x    | Planned   |
| 4.4   | Multi-core example program (reduction)       | Jul 10     | х    | Planned   |
| 5.1   | SPI-FPGA interface for OTG programming       | TBD        |      | Cancelled |
| 5.2   | FPGA-PC interfacing                          | TBD        |      | Cancelled |
| 5.3   | FPGA-PC debugging (instruction breakpoints)  | TBD        |      | Cancelled |
| 6.1   | Compiler backend for vmicro16                | TBD        |      | Unknown   |
| 6.2   | Compiler support for multi-core codegen      | TBD        |      | Unknown   |
| 7.1   | Wishbone peripherals for demo                | Aug 01     | x    | Planned   |
| 8.1   | Final Report                                 | Jun 05     | x    | Planned   |

Table 5.1: Updated project stages.

## Chapter 6

## **Conclusion**

With the end of Moore's Law looming, processor designers must use other strategies to continue improving performance of processors – multiprocessor and parallelism being a primary strategy. This projects sets out to improve my knowledge on multiprocessor communication by designing, implementing, and verifying a multiprocessor – and I believe starting from scratch is the best way to accomplish this learning task.

To date, a compact 16-bit RISC instruction set has been designed and implemented in a Verilog single-core processor. Whilst single-core verification is still on-going, good progress has been made and extended deliverables from stage 3, such as instruction pipelining and memory-mapped peripherals via a Wishbone bus, has been implemented successfully.

Stage 5's extended deliverables and the cache memory have been cancelled but they do not effect the core functionality of the processor. The planned project time-line for future stages is realistic and accomplishing the project's goals appears achievable.

REFERENCES 28

### References

[1] V. Subramanian, "Multiple gate field-effect transistors for future cmos technologies," *IETE Technical review*, vol. 27, no. 6, pp. 446–454, 2010.

- [2] T. Technologies, "Soc platform cyclone de1-soc board." [Online]. Available: https://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&No=836
- [3] *MiniSpartan6+*, Scarab Hardware, 2014. [Online]. Available: https://www.scarabhardware.com/minispartan6/
- [4] P. Kocher, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, S. Mangard, T. Prescher, M. Schwarz, and Y. Yarom, "Spectre attacks: Exploiting speculative execution," arXiv preprint arXiv:1801.01203, 2018.
- [5] J. Balkind, M. McKeown, Y. Fu, T. Nguyen, Y. Zhou, A. Lavrov, M. Shahrad, A. Fuchs, S. Payne, X. Liang et al., "Openpiton: An open source manycore research framework," in ACM SIGARCH Computer Architecture News, vol. 44, no. 2. ACM, 2016, pp. 217–232.
- [6] N. Satish, M. Harris, and M. Garland, "Designing efficient sorting algorithms for manycore gpus," in 2009 IEEE International Symposium on Parallel & Distributed Processing. IEEE, 2009, pp. 1–10.
- [7] S. Binet, P. Calafiura, S. Snyder, W. Wiedenmann, and F. Winklmeier, "Harnessing multicores: Strategies and implementations in atlas," in *Journal of Physics: Conference Series*, vol. 219, no. 4. IOP Publishing, 2010, p. 042002.

# Appendix A - Code Listing

#### vmicro16.v

The single core RISC processor is defined in this file. It contains many submodules such as the decoder and local memory.

```
// This file contains multiple modules.
// Verilator likes 1 file for each module
/* verilator lint_off DECLFILENAME */
                /* verilator lint_off UNUSED */
/* verilator lint_off BLKSEQ */
/* verilator lint_off WIDTH */
                // Include Vmicro16 ISA containing definitions for the bits `include "vmicro16_isa.v"
10
11
12
13
14
15
16
17
18
19
20
                `include "clog2.v"
`include "formal.v"
                // This module aims to be a SYNCHRONOUS, WRITE_FIRST BLOCK RAM
// https://www.xilinx.com/support/documentation/user_guides/ug473_7Series_Memory_Resources.pdf
// https://www.xilinx.com/support/documentation/user_guides/ug383.pdf
                            https://www.xilinx.com/support/documentation/sw_manuals/xilinx2016_4/ug901-vivado-synthesis.pdf
                module vmicro16_bram # (
                        hale vmicroi6_bram # (
parameter MEM_WIDTH = 16,
parameter MEM_DEFTH = 64,
parameter CORE_ID = 0,
parameter USE_INITS = 0,
parameter PARAM_DEFAULTS_R0 = 0,
parameter PARAM_DEFAULTS_R1 = 0,
parameter PARAM_DEFAULTS_R2 = 0,
parameter PARAM_DEFAULTS_R3 = 0,
parameter NAME = "BRAM"
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
                          input
                                                       ['clog2(MEM_DEPTH)-1:0] mem_addr,
                                                       [MEM WIDTH-1:0]
\begin{array}{c} 36\\ 37\\ 38\\ 40\\ 41\\ 42\\ 43\\ 44\\ 45\\ 50\\ 55\\ 55\\ 55\\ 56\\ 60\\ 62\\ 63\\ 64\\ 66\\ 67\\ 68\\ 67\\ 77\\ 78\\ 77\\ 78\\ 79\\ 80\\ \end{array}
                          output reg [MEM_WIDTH-1:0]
                         // memory vector
(* ram_style = "block" *)
reg [MEM_WIDTH-1:0] mem [0:MEM_DEPTH-1];
                          // not synthesizable
                         // not synthesizable
integer i;
initial begin
for (i = 0; i < MEM_DEPTH; i = i + 1) mem[i] = 0;
mem[0] = PARAM_DEFAULTS_R0;
mem[1] = PARAM_DEFAULTS_R1;
mem[2] = PARAM_DEFAULTS_R2;
mem[3] = PARAM_DEFAULTS_R3;</pre>
                                   if (USE_INITS) begin
                                               (VSS_INIS) Degin
//define TEST_SW
ifdef TEST_SW
$readmemh("E:\\Projects\\uni\\vmicro16\\sw\\verilog_memh.txt", mem);
`endif
                                               weilne lEST_ASM
'ifdef TEST_ASM
Sreadmemh("E:\\Projects\\uni\\vmicro16\\sw\\asm.s.hex", mem);
'endif
                                              //'define TEST_COND

'ifdef TEST_COND
mem[0] = {'VMICR016_OP_MOVI, 3'h7, 8'hCO}; // lock
mem[0] = {'VMICR016_OP_MOVI, 3'h7, 8'hCO}; // lock
'endif
                                              //`define TEST_CMP

'ifdef TEST_CMP

mem[0] = {\text{VMICR016_0P_MOVI,}}

mem[1] = {\text{VMICR016_0P_MOVI,}}

mem[2] = {\text{VMICR016_0P_CMP,}}
                                                                                                                                3'h0, 8'h0A};
3'h1, 8'h0B};
3'h1, 3'h0, 5'h1};
                                              //'define TEST_LWEX
'ifdef TEST_LWEX
mem[0] = {'VMICRO16_0P_MOVI,
mem[1] = {'VMICRO16_0P_LW,
mem[2] = {'VMICRO16_0P_LW,
mem[3] = {'VMICRO16_0P_LWEX,
                                                                                                                                3'h0, 8'hC5};
3'h0, 3'h0, 5'h1};
3'h2, 3'h0, 5'h1};
```

```
mem[4] = {`VMICR016_OP_SWEX, 3'h3, 3'h0, 5'h1};
`endif
      82
83
84
85
86
87
88
89
91
92
93
94
95
                                                                                                                                                          //`define TEST_MULTICORE
`ifdef TEST_MULTICORE
mem[0] = {`VMICRO16_OP_MOVI,
mem[1] = {`VMICRO16_OP_MOVI,
                                                                                                                                                                                                                                                                                                                                                                                                                     3'h0, 8'h90);
3'h1, 8'h33);
3'h1, 3'h0, 5'h0);
3'h0, 8'h80);
3'h2, 3'h0, 5'h0);
3'h1, 8'h33);
3'h1, 8'h33);
3'h1, 8'h33);
3'h0, 8'h91);
3'h2, 3'h0, 5'h0);
                                                                                                                                                      mem[1] = {'WHICRO16_0P_MOVI,
mem[2] = {'WHICRO16_0P_MOVI,
mem[3] = {'WHICRO16_0P_LW,
mem[5] = {'WHICRO16_0P_MOVI,
mem[6] = {'WHICRO16_0P_MOVI,
mem[7] = {'WHICRO16_0P_MOVI,
mem[8] = {'WHICRO16_0P_MOVI,
mem[8] = {'WHICRO16_0P_MOVI,
mem[9] = {'WHICRO16_0P_SW,
  96
97
98
99
100
101
                                                                                                                                                               `endif
                                                                                                                                                      //define TEST_BR
ifdef TEST_BR
mem[0] = {'WMICR016_OP_MOVI, 3'h0, 8'h0};
mem[1] = {'WMICR016_OP_MOVI, 3'h3, 8'h3};
mem[2] = {'WMICR016_OP_MOVI, 3'h1, 8'h2};
mem[3] = {'WMICR016_OP_ARII, U, 3'h0, 3'h1, 5'b11111};
mem[4] = {'WMICR016_OP_BR, 3'h0, "WMICR016_OP_BR_U};
mem[5] = {'WMICR016_OP_MOVI, 3'h0, 8'hFF};
'endif
  102
103
104
  105
106
107
108
109
110
                                                                                                                                                          //`define ALL_TEST
`ifdef ALL_TEST
                                                                                                                                                          // Standard all test
// REGSO
  111
  112
                                                                                                                                                      // REGSO
mem[0] = { VMICRO16_OP_MOVI,
mem[1] = { VMICRO16_OP_SW,
mem[2] = { VMICRO16_OP_SW,
// GPIOO
mem[3] = { VMICRO16_OP_MOVI,
mem[4] = { VMICRO16_OP_MOVI,
mem[5] = { VMICRO16_OP_SW,
mem[6] = { VMICRO16_OP_SW,
// TIMO
                                                                                                                                                                                                                                                                                                                                                                                                                     3'h0, 8'h81};
3'h1, 3'h0, 5'h0}; // MMU[0x81] = 6
3'h2, 3'h0, 5'h1}; // MMU[0x82] = 6
  113
114
115
116
117
                                                                                                                                                                                                                                                                                                                                                                                                                          3'h0, 8'h90};
3'h1, 8'hD};
3'h1, 3'h0, 5'h0};
3'h2, 3'h0, 5'h0};
  118
119
                                                                                                                                                      mem[a] = { VMICRO16_UP_SW, mem[a] = { VMICRO16_UP_LW, // TIMO mem[7] = { VMICRO16_UP_LW, mem[8] = { VMICRO16_UP_LW, mem[8] = { VMICRO16_UP_MOVI, mem[9] = { VMICRO16_UP_MOVI, mem[10] = { VMICRO16_UP_MOVI, mem[13] = { VMICRO16_UP_MOVI, mem[13] = { VMICRO16_UP_MOVI, mem[14] = { VMICRO16_UP_MOVI, mem[14] = { VMICRO16_UP_SW, mem[14] = { VMICRO16_UP_MOVI, mem[15] = { VMICRO16_UP_MOVI, mem[16] = { VMICRO16_UP_MOVI, mem[17] = { VMICRO16_UP_MOVI, mem[18] = { VMICRO16_UP_SW, mem[18] = { VMICRO16_UP_SW, mem[18] = { VMICRO16_UP_SW, mem[19] 
  120
  121
  122
123
                                                                                                                                                                                                                                                                                                                                                                                                                            3'h0, 8'h07};
3'h3, 3'h0, 5'h03};
    124
125
                                                                                                                                                                                                                                                                                                                                                                                                                                   3'h0, 8'hA0};
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    // UARTO
// ascii A
                                                                                                                                                                                                                                                                                                                                                                                                                              3'h0, 8'h40; // UAB
3'h1, 8'h41; // asc
3'h1, 3'h0, 5'h0;
3'h1, 3'h0, 5'h0;
3'h1, 8'h42; // ascii B
3'h1, 3'h0, 5'h0;
3'h1, 8'h43; // ascii C
3'h1, 3'h0, 5'h0;
3'h1, 8'h44; // ascii D
3'h1, 8'h44; // ascii D
3'h1, 3'h0, 5'h0;
3'h1, 8'h45; // ascii D
  126
  127
128
  129
  130
130
131
132
133
134
                                                                                                                                                      mem[18] = ("WHICRO16_OP_MOVI,
mem[20] = ("WHICRO16_OP_SW,
mem[20] = ("WHICRO16_OP_SW,
mem[21] = ("WHICRO16_OP_SW,
Mem[22] = ("WHICRO16_OP_MOVI,
mem[22] = ("WHICRO16_OP_MOVI,
mem[24] = ("WHICRO16_OP_SW,
mem[24] = ("WHICRO16_OP_LW,
Mem[24] = ("WHICRO16_OP_
135
                                                                                                                                                                                                                                                                                                                                                                                                                                   3'h1, 3'h0, 5'h0};
3'h1, 8'h46}; // ascii E
  136
                                                                                                                                                                                                                                                                                                                                                                                                                                     3'h1, 3'h0, 5'h0};
  137
    138
139
140
                                                                                                                                                                                                                                                                                                                                                                                                                                   3'h0, 8'hC0};
3'h1, 8'hA};
3'h1, 3'h0, 5'h5};
3'h2, 3'h0, 5'h5};
  141
  142
143
                                                                                                                                                      mem[26] = ('WMCRO16_0P_LW,
'/GPI01 (SSD 24-bit port)
mem[26] = ('WMCRO16_0P_MOVI,
mem[27] = ('WMCRO16_0P_MOVI,
mem[28] = ('WMCRO16_0P_LW,
//GPI02
mem[30] = ('WMCRO16_0P_MOVI,
mem[31] = ('WMCRO16_0P_MOVI,
mem[32] = ('WMCRO16_0P_SW,
)
                                                                                                                                                                                                                                                                                                                                                                                                                                   3'h0, 8'h91};
3'h1, 8'h12};
3'h1, 3'h0, 5'h0};
3'h2, 3'h0, 5'h0};
  144
  145
  146
147
148
149
                                                                                                                                                                                                                                                                                                                                                                                                                                3'h0, 8'h92};
3'h1, 8'h56};
3'h1, 3'h0, 5'h0};
  150
151
152
153
154
155
156
157
                                                                                                                                                                 endif
                                                                                                                                                      //define TEST_BRAM
'ifdef TEST_BRAM
// 2 core BRAMO test
mem[0] = { 'VMICRO16_DP_MOVI,
mem[1] = { 'VMICRO16_DP_SW,
mem[3] = { 'VMICRO16_DP_LW,
} endif
                                                                                                                                                                                                                                                                                                                                                                                                                          3'h0, 8'hC0};
3'h1, 8'hA};
3'h1, 3'h0, 5'h5};
3'h2, 3'h0, 5'h5};
  158
159
160
161
162
163
164
165
                                                                                    always @(posedge clk) begin
// synchronous WRITE_FIRST (page 13)
if (mem_we) begin
mem[mem_addr] <= mem_in;
%display(%time, "\t\%s[%h] <= %h",
NAME, mem_addr, mem_in);
end else
166
167
168
169
170
171
                                                                                                                 end else
mem_out <= mem[mem_addr];
  172
                                                                                      end
  173
  174
                                                        // TODO: Reset impl = every clock while reset is asserted, clear each cell // one at a time, mem[i++] \  < = \  0 endmodule
  175
  176
177
178
  179
180
                                                        module vmicro16_core_mmu # (
parameter MEM_WIDTH = 16,
parameter MEM_DEPTH = 64,
  181
  182
  183
184
185
186
187
                                                                                        parameter CORE_ID = 3'h0,
parameter CORE_ID_BITS = `clog2(`CORES)
                                                        ) (
                                                                                        input clk,
  188
                                                                                        input reset,
  189
                                                                                        input req, output busy,
  190
191
192
193
194
                                                                                      // From core
input [MEM_WIDTH-1:0] mmu_addr,
input [MEM_WIDTH-1:0] mmu_in,
mmu_we,
  195
  196
197
                                                                                                                                                                                                                                                                                                                            mmu_lwex,
```

```
198
199
200
201
202
203
204
                         // interrupts
output reg [`DATA_WIDTH*`DEF_NUM_INT-1:0] ints_vector,
output reg [`DEF_NUM_INT-1:0] ints_mask,
205
                         // TO APB interconnect
                         // TO APB interconnect
output reg ['APB_WIDTH-1:0] M_PADDR,
output reg M_PWRITE,
output reg M_PSELX,
output reg M_PENABLE,
output reg [MEM_WIDTH-1:0] M_PWDATA,
input [MEM_WIDTH-1:0] M_PRDATA,
input M_PREADY
206
207
208
209
210
211
212
213
214
215
216
217
                         218
219
220
221
                         reg [MEM_WIDTH-1:0] per_out = 0;
wire [MEM_WIDTH-1:0] tim0_out;
222
223
224
225
226
227
228
                         assign busy = req || (mmu_state == MMU_STATE_T2);
                        // more luts than below but easier
//wire tim0_en = (mmu_addr >= 'DEF_MMU_TIM0_S)
// && (mmu_addr <= 'DEF_MMU_TIM0_E);
//wire sreg_en = (mmu_addr >= 'DEF_MMU_SREG_E);
// && (mmu_addr >= 'DEF_MMU_SREG_E);
//wire intv_en = (mmu_addr >= 'DEF_MMU_INTSV_E);
//wire intm_en = (mmu_addr >= 'DEF_MMU_INTSV_E);
//wire intm_en = (mmu_addr >= 'DEF_MMU_INTSV_E);
// && (mmu_addr <= 'DEF_MMU_INTSM_S)
// && (mmu_addr <= 'DEF_MMU_INTSM_E);
229
230
231
232
233
234
235
                         wire tim0_en = "mmu_addr[12] && "mmu_addr[9] && "mmu_addr[7];
wire sreg_en = mmu_addr[7] && "mmu_addr[4] && "mmu_addr[5];
wire intv_en = mmu_addr[8] && "mmu_addr[3];
wire intm_en = mmu_addr[8] && mmu_addr[3];
236
237
238
239
                         wire apb_en = !(|{tim0_en, sreg_en, intv_en, intm_en});
wire tim0_we = (tim0_en && mmu_we);
wire intv_we = (intw_en && mmu_we);
wire intm_we = (intm_en && mmu_we);
242
243
244
                         // Special register selects
localparam SPECIAL_REGS = 8;
wire [MEM_WIDTH-1:0] sr_val;
245
245
246
247
248
                         // Interrupt vector and mask
initial ints_vector = 0;
initial ints_mask = 0;
249
250
251
                         wire (2:0) intv_addr = mmu_addr['clog2('DEF_NUM_INT)-1:0];
always @(posedge clk)
    if (intv_we)
        ints_vector[intv_addr*'DATA_WIDTH +: 'DATA_WIDTH] <= mmu_in;</pre>
252
253
254
255
256
257
258
259
                        always @(posedge clk)
if (intm_we)
                                          ints_mask <= mmu_in;
260
261
262
263
264
265
                        266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
                         always @(intm_we)
    $display($time, "\tC%d\t\tintm_we W: %b", CORE_ID, ints_mask);
                         // Output port always @(*)
                                 ays @(*)
if (timO_en) mmu_out = timO_out;
else if (sreg_en) mmu_out = sr_val;
else if (intv_en) mmu_out = ints_vector[mmu_addr[2:0]*`DATA_WIDTH
+: `DATA_WIDTH];
282
283
284
285
286
287
288
                                 // APB master to slave interface
                        // APB master to slave interface
always @(posedge clk)
if (reset) begin
    mmu_state <= MMU_STATE_T1;
    M_PEMABLE <= 0;
    M_PADDR <= 0;
    M_PWDATA <= 0;
    M_PSELx <= 0;
    M_PWRITE <= 0;

289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
                                  else
                                           casex (mmu_state)
                                                   ex (mmu_state)

MMU_STATE_T1: begin

if (req && apb_en) begin

M_PADDR <= {mmu_lwex,
                                                                                                     mmu_swex,
CORE_ID[CORE_ID_BITS-1:0],
304
305
                                                                                                     mmu addr[MEM WIDTH-1:0]}:
306
307
308
309
310
                                                                     M_PWDATA <= mmu_in;
M_PSELx <= 1;
M_PWRITE <= mmu_we;</pre>
                                                  __ec;
mmu_state <= MMU_STATE_T2;
end
end</pre>
311
```

```
314
315
316
317
318
319
320
                                                   `ifdef FIX_T3

MMU_STATE_T2: begin

M_PENABLE <= 1;
                                                                      if (M_PREADY == 1'b1) begin
    mmu_state <= MMU_STATE_T3;</pre>
                                                                      end
321
322
323
324
325
326
327
328
329
                                                              MMU_STATE_T3: begin
   // Slave has output a ready signal (finished)
   M_PENABLE <= 0;</pre>
                                                                      M_PENBALE <= 0;
M_PADDR <= 0;
M_PNDRTA <= 0;
M_PSELx <= 0;
M_PSELx <= 0;
M_PWRITE <= 0;
// Clock the peripheral output into a reg,
// to output on the next clock cycle
per_out <= M_PRDATA;
330
331
332
333
334
335
336
337
                                                                       mmu state <= MMU STATE T1:
                                                              end
                                                     `else
                                                             se
// No FIX_T3
MMU_STATE_T2: begin
if (M_PREADY == 1'b1) begin
M_FENABLE <= 0;
M_PADDR <= 0;
M_PWDATA <= 0;
M_PSELX <= 0;
M_DTSTE <= 0:
338
339
340
341
342
343
344
                                                                               M_PWRITE <= 0;
// Clock the peripheral output into a reg,
// to output on the next clock cycle
per_out <= M_PRDATA;
345
346
347
348
349
                                                                                mmu_state <= MMU_STATE_T1;
350
351
352
353
354
355
356
357
358
359
                                                                      end else begin
M_PENABLE <= 1;
end
                                                     `endif
                         360
361
362
363
364
365
366
367
                                   .NAME
                                                             ("ram_sr")
                        .NAME
) ram_sr (
.clk
.reset
.mem_addr
.mem_in
.mem_we
368
369
370
371
372
373
374
375
376
377
378
379
380
381
                                                            (clk),
  (reset),
  (mmu_addr[^clog2(SPECIAL_REGS)-1:0]),
  (),
  (sr_val)
                                   .mem_out
                         );
                         382
383
384
385
386
387
388
                                   .USE_INITS
                                                            (0),
                                                              ("TIMO")
                                     . NAME
                        .NAME
) TIMO (
.clk
.reset
.mem_addr
.mem_in
                                                              (clk),
(reset),
(mmu_addr[7:0]),
(mmu_in),
389
                                   .mem_we
                                                              (timO_we),
390
391
392
393
394
395
396
397
                                                              (timO_out)
                );
endmodule
                module vmicro16_regs # (
                        lule vmicroif.regs # (
parameter CELL_VIDTH = 16,
parameter CELL_DEPTH = 8,
parameter CELL_SEL_BITS = clog2(CEI
parameter CELL_SEAULTS = 0,
parameter DEBUG_NAME = "",
parameter CORE_ID = 0,
parameter PARAM_DEFAULTS_R0 = 16'h0000,
parameter PARAM_DEFAULTS_R1 = 16'h0000
                                                                                       = 10,
= 8,
= `clog2(CELL_DEPTH),
398
399
400
401
402
403
404
405
                ) (
                        input clk,
input reset,
// Dual port register reads
input [CELL_SEL_BITS-1:0] rs1, // port 1
output [CELL_WIDTH-1 :0] rd1,
//input [CELL_SEL_BITS-1:0] rs2, // port 2
// output [CELL_WIDTH-1 :0] rd2,
// EX/WB final stage write back
input we,
406
407
408
409
410
411
412
413
                         input
input [CELL_SEL_BITS-1:0]
input [CELL_WIDTH-1:0]
414
415
416
417
418
419
                         (* ram_style = "distributed" *)
reg [CELL_WIDTH-1:0] regs [0:CELL_DEPTH-1] /*verilator public_flat*/;
420
                         // Initialise registers with default values
 421
                          // Intrastise registers with default values
// Really only used for special registers used by the soc
// TODO: How to do this on reset?
integer i;
initial
422
423
424
425
426
427
428
                         initial
if (CELL_DEFAULTS)
                                           $readmenh(CELL_DEFAULTS, regs);
                                  else begin
                                            for(i = 0; i < CELL_DEPTH; i = i + 1)</pre>
429
```

```
regs[i] = 0;
regs[0] = PARAM_DEFAULTS_R0;
regs[1] = PARAM_DEFAULTS_R1;
430
431
432
433
434
435
436
437
                           `ifdef ICARUS
                                 always @(regs)
                                        438
439
440
441
442
443
                                                    regs[0], regs[1], regs[2], regs[3], regs[4], regs[5], regs[6], regs[7]);
                        always @(posedge clk)
if (reset) begin
for(i = 0; i < CELL_DEPTH; i = i + 1)
regs[i] <= 0;
regs[o] <= PARAM_DEFAULTS_RO;
regs[i] <= PARAM_DEFAULTS_RI;
end
else if (wa) begin
 444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
                                 else if (we) begin
                                          // Perform the write
regs[ws1] <= wd;
end
                         // sync writes, async reads
                         assign rd1 = regs[rs1];
//assign rd2 = regs[rs2];
459
 460
                endmodule
 461
462
463
464
465
                module vmicro16_dec # (
   parameter INSTR_WIDTH = 16,
   parameter INSTR_OP_WIDTH = 5,
   parameter INSTR_SW_WIDTH = 3,
   parameter ALU_OP_WIDTH = 5
\begin{array}{c} 466 \\ 467 \end{array}
 468
                ) (
                         //input clk, // not used yet (all combinational)
//input reset, // not used yet (all combinational)
 469
470
471
472
473
474
475
                         input [INSTR_WIDTH-1:0] instr,
                        output [INSTR_OP_WIDTH-1:0] opcode, output [INSTR_RS_WIDTH-1:0] rd, output [INSTR_RS_WIDTH-1:0] ra, output [3:0] imm4, output [7:0] imm5, output [7:0] imm12, output [4:0] simm5,
476
477
478
479
480
481
482
                         // This can be freely increased without affecting the isa
483
                         output reg [ALU_OP_WIDTH-1:0] alu_op,
484
                        output reg has_imm4,
output reg has_imm8,
output reg has_imm12,
output reg has_we,
output reg has_br,
output reg has_mem,
output reg has_mem_we,
output reg has_cmp,
 485
486
487
488
489
 490
 491
492
493
494
495
496
497
                         output halt, output intr,
                         output reg has_lwex,
output reg has_swex
498
499
500
501
502
503
504
                         // TODO: Use to identify bad instruction and
// raise exceptions
//,output is_bad
                );
                         assign opcode = instr[15:11];
                        assign qbcode = instr[18:11;

assign ra = instr[0:8];

assign ima = instr[3:0];

assign imm8 = instr[7:0];

assign imm12 = instr[11:0];

assign simm5 = instr[4:0];
505
506
507
508
509
510
511
512
513
                         // exme_op
always @(*) case (opcode)
514
515
                                    VMICRO16 OP SPCL: casez(instr[11:0])
                                            VMICRO16 OP SPCL NOP
516
517
518
519
                                           `VMICRO16_OP_SPCL_HALT,
`VMICRO16_OP_SPCL_INTR:
                                                                                                    alu_op = `VMICRO16_ALU_NOP;
alu_op = `VMICRO16_ALU_NOP; endcase
                                           default:
                                                                                                    alu_op = `VMICRO16_ALU_LW;
alu_op = `VMICRO16_ALU_SW;
alu_op = `VMICRO16_ALU_LW;
alu_op = `VMICRO16_ALU_SW;
                                   `VMICRO16_OP_LW:
520
521
522
                                   `VMICRO16_OP_SW:
`VMICRO16_OP_LWEX:
523
524
525
526
527
528
529
530
531
532
533
534
535
                                   `VMICRO16_OP_SWEX:
                                   `VMICRO16_OP_MOV:
`VMICRO16_OP_MOVI:
                                                                                                    alu_op = `VMICRO16_ALU_MOV;
alu_op = `VMICRO16_ALU_MOVI;
                                                                                                    alu_op = `VMICRO16_ALU_BR;
alu_op = `VMICRO16_ALU_MULT;
                                   'VMICRO16_OP_BR:
                                   `VMICRO16_OP_MULT:
                                                                                                     alu_op = `VMICRO16_ALU_CMP;
alu_op = `VMICRO16_ALU_SETC;
                                   'VMTCRO16 OP CMP:
                                   `VMICRO16_OP_BIT: ca
`VMICRO16_OP_BIT_OR:
                                                                                     casez (simm5)
                                                                                                    simm5)
alu_op = 'VMICRO16_ALU_BIT_OR;
alu_op = 'VMICRO16_ALU_BIT_XOR;
alu_op = 'VMICRO16_ALU_BIT_XOR;
alu_op = 'VMICRO16_ALU_BIT_NOT;
alu_op = 'VMICRO16_ALU_BIT_NSHT;
alu_op = 'VMICRO16_ALU_BIT_RSHFT;
alu_op = 'VMICRO16_ALU_BIT_RSHFT;
alu_op = 'VMICRO16_ALU_BIT_RSHFT;
536
537
                                             VMICRO16_OP_BIT_XOR:
VMICRO16_OP_BIT_AND:
                                            VMICRO16_OP_BIT_RND:

VMICRO16_OP_BIT_LSHFT:

VMICRO16_OP_BIT_RSHFT:
538
539
540
541
542
543
544
                                            CRO16_OP_ARITH_U: casez (simm5)

`VMICRO16_OP_ARITH_UADD: alu_op = `VMICRO16_ALU_ARITH_UADD;

`VMICRO16_OP_ARITH_USUB: alu_op = `VMICRO16_ALU_ARITH_USUB;
                                   `VMICRO16_OP_ARITH_U:
545
```

```
`VMICRO16_OP_ARITH_UADDI: alu_op = `VMICRO16_ALU_ARITH_UADDI; default: alu_op = `VMICRO16_ALU_BAD; endcase
546
547
548
549
550
551
552
553
554
                            `VMICRO16_OP_ARITH_S: casez (simm5)

'VMICRO16_OP_ARITH_SADD: alu_op = 'VMICRO16_ALU_ARITH_SADD;

'VMICRO16_OP_ARITH_SSUB: alu_op = 'VMICRO16_ALU_ARITH_SSUB;

'VMICRO16_OP_ARITH_SSUBI: alu_op = 'VMICRO16_ALU_ARITH_SSUBI;

default: alu_op = 'VMICRO16_ALU_ARITH_SSUBI;
555
556
557
558
559
                             default: begin
                             alu_op = `VMICRO16_ALU_NOP;
$display($time, "\tDEC: unknown opcode: %h ... NOPPING", opcode);
end
                     endcase
560
561
                     // Special opcodes
                     // operar opcodes
// assign nat = ((opcode == `VMICRO16_OP_SPCL) & (`instr[0]);
assign halt = ((opcode == `VMICRO16_OP_SPCL) & instr[0]);
assign intr = ((opcode == `VMICRO16_OP_SPCL) & instr[1]);
562
563
564
565
566
567
568
569
                      // Register writes
                     always @(*) case (opcode)
     `VMICR016_OP_LWEX,
                             'VMTCRO16 OP SWEX.
                            VMICRO16_OP_SWEX,

VMICRO16_OP_LW,

VMICRO16_OP_MOV,

VMICRO16_OP_MOVI,

//`VMICRO16_OP_MOVI_L,

VMICRO16_OP_ARITH_U,
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
                             `VMICRO16_OP_ARITH_S,
                             'VMICRO16_OP_SETC,
                    `VMICRO16_OP_BIT,
`VMICRO16_OP_MULT:
default:
endcase
                                                                         has_we = 1'b1;
has_we = 1'b0;
                     // Contains 4-bit immediate always @(*)
                           has_imm4 = 1'b0;
                    592
                                                                         has_imm8 = 1'b1;
has_imm8 = 1'b0;
593
594
595
596
597
598
                            'VMICRO16_OP_BR:
                     //// Contains 12-bit immediate
//always @(*) case (opcode)
//always @(*) case (opcode)
has_imm12 = 1'b1;
// default: has_imm12 = 1'b0;
599
600
601
602
603
604
605
                     //endcase
                    //Will branch the pc
always @(*) case (opcode)
    'WMICRO16_OP_BR: has_br = 1'b1;
    '~fault; has_br = 1'b0;
606
607
608
609
610
611
612
613
                     VMICRO16_OP_LWEX,
                             'VMICRO16_OP_SWEX: has_mem = 1'b1;
default: has_mem = 1'b0;
614
615
                             default:
616
617
618
619
                     endcase
                     // Requires external memory write
always @(*) case (opcode)
   `VMICRO16_0P_SW,
620
                        VMICRO16_OP_SW,

'VMICRO16_OP_SWEX: has_mem_we = 1'b1;
default: has_mem_we = 1'b0;
621
622
623
624
625
626
627
                     endcase
                     default:
628
629
630
                     631
632
633
634
635
                    636
637
638
             `VMICRO16
default:
endcase
endmodule
639
640
641
642
643 \\ 644
             module vmicro16_alu # (
    parameter OP_WIDTH = 5,
    parameter DATA_WIDTH = 16,
    parameter CORE_ID = 0
645
646
647
648
649
650
651
                     // input clk, // TODO: make clocked
                    input [OP_WIDTH-1:0] op,
input [DATA_WIDTH-1:0] a, // rs1/dst
input [DATA_WIDTH-1:0] b, // rs2
input [3:0] flags,
output reg [DATA_WIDTH-1:0] c
652
653
654
655
656
657
658
                    localparam TOP_BIT = (DATA_WIDTH-1);
// 17-bit register
reg [DATA_WIDTH:0] cmp_tmp = 0; // = {carry, [15:0]}
wire r_set;
659
660
```

```
662
663
664
665
666
667
668
669
670
671
672
673
674
675
                                                                                                    c = {DATA_WIDTH{1'b0}};
                                    // load/store addresses (use value in rd2)
                                    // load/store addresse:
VMICRO16_ALU_LW,
VMICRO16_ALU_SW:
// bitwise operations
VMICRO16_ALU_BIT_CR:
VMICRO16_ALU_BIT_XOR:
VMICRO16_ALU_BIT_ANDT.
                                                                                                  c = b;
                                                                                                 c = a | b;
c = a ^ b;
c = a & b;
c = ~(b);
c = a << b;
c = a >> b;
                                      VMICRO16_ALU_BIT_NOT:
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
                                      VMICRO16 ALU BIT LSHFT:
                                    `VMICRO16_ALU_BIT_RSHFT:
                                    `VMICRO16_ALU_MOV:
'VMICRO16_ALU_MOVI:
'VMICRO16_ALU_MOVI_L:
                                     // TODO: ALU should have simm5 as input

`VMICRO16_ALU_ARITH_UADDI: c = a + b;
                                  'ifdef DEF_ALU_HW_MULT
    'VMICRO16_ALU_MULT: c = a * b;
'endif
691
                                    692
693
694
695
696
697
700
701
702
703
704
705
706
707
708
                                    'VMICRO16_ALU_CMP: begin
   // TODO: Do a-b in 17-bit register
   // Set zero, overflow, carry, signed bits in result
                                             cmp_tmp = a - b;
c = 0;
                                            // N Negative condition code flag
// Z Zero condition code flag
// C Carry condition code flag
// V Overflow condition code flag
c['VMICR016_SFLAG_N] = cmp_tmp[TOP_BIT];
c['VMICR016_SFLAG_Z] = (cmp_tmp == 0);
c['VMICR016_SFLAG_C] = 0; //cmp_tmp[TOP_BIT+1]; // not used
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
                                             // Overflow flag
// https://stackoverflow.com/questions/30957188/
                                             // https://github.com/bendl/prco304/blob/master/prco_core/rtl/prco_alu.v#L50
case(cmp_tmp[TOP_BIT+1:TOP_BIT])
2'b01: c['WHCR016_SFLAG_V] = 1;
2'b10: c['WHCR016_SFLAG_V] = 1;
                                                        default: c[`VMICRO16_SFLAG_V] = 0;
                                            display(time, "\tC%02h: ALU CMP: %h %h = %h = %b", CORE_ID, a, b, cmp_tmp, c[3:0]);
                                    `VMICRO16_ALU_SETC: c = { {15{1'b0}}, r_setc };
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
                                    // TODO: Parameterise
default: begin
$display($time, "\tALU: unknown op: "Ah", op);
                                             c = 0;
cmp_tmp = 0;
                                    end
                                                      endcase
                          branch setc_check (
.flags (flags),
.cond (b[7:0]),
                                   .en
                                                              (r_setc)
                 endmodule
                 // flags = 4 bit r_cmp_flags register
// cond = 8 bit VMICR016_OP_BR_? value. See vmicro16_isa.v
module branch (
   input [3:0] flags,
   input [7:0] cond,
746
747
                          output reg en
                 );
                                ays @(*)

case (cond)

'WMICRO16_0P_BR_U: en = 1;

'WMICRO16_0P_BR_E: en = (flags['VMICRO16_SFLAG_Z] == 1);

'WMICRO16_0P_BR_NE: en = (flags['VMICRO16_SFLAG_Z] == 0);

'WMICRO16_0P_BR_D: en = (flags['VMICRO16_SFLAG_Z] == 0) && (flags['VMICRO16_SFLAG_Z] == 0) && (flags['VMICRO16_SFLAG_Z] == 0) && (flags['VMICRO16_SFLAG_Z] == 1];

'WMICRO16_0P_BR_LE: en = (flags['VMICRO16_SFLAG_Z] == flags['VMICRO16_SFLAG_N]);

'WMICRO16_0P_BR_LE: en = (flags['VMICRO16_SFLAG_Z] == flags['VMICRO16_SFLAG_N]);

'WMICRO16_OP_BR_LE: en = (flags['VMICRO16_SFLAG_Z] == flags['VMICRO16_SFLAG_N]);

en = 0;
                          always @(*)
748
749
750
751
752
753
754
755
756
757
758
760
761
762
763
764
765
766
767
768
766
767
                 endmodule
                module vmicro16_core # (
parameter DATA_WIDTH = 16,
parameter MEM_INSTR_DEPTH = 64,
parameter MEM_SCRATCH_DEPTH = 64,
parameter MEM_WIDTH = 16,
770
771
772
773
774
775
776
777
                          parameter CORE_ID
                                                                                          = 3'h0
                                                        reset,
                         input
                         output [7:0] dbug,
```

```
output
                                                  halt,
778
779
780
781
782
783
784
785
786
787
790
791
792
793
794
795
796
797
798
799
                       // interrupt sources
input ['DEF_NUM_INT-1:0] ints,
input ['DEF_NUM_INT*-DATA_WIDTH-1:0] ints_data,
output ['DEF_NUM_INT-1:0] ints_ack,
                        // APB master to slave interface (apb_intercon)
                       // APB
output
output
output
output
output
                                                                                 w_PADDR,
w_PWRITE,
w_PSELx,
w_PENABLE,
w_PWDATA,
                                        [ APB_WIDTH-1:0]
                                        [DATA_WIDTH-1:0]
                                        [DATA_WIDTH-1:0]
                       input
input
                                                                                   w_PRDATA,
                                                                                   w PREADY
          __sTR_MEM
__ster interface t
__stereg ['APB_WIDTH-1:0]
output reg
output reg
output reg
output reg
output reg
[DATA_WIDTH-1:0]
input
input
'endif
);
                w2_PSELx,
                                                                                         w2 PENABLE.
800
                                                                                        w2_PWDATA,
                                               [DATA_WIDTH-1:0] W2_PWDATA,
801
                                                                                         w2_PREADY
                      localparam STATE_IF = 0;
localparam STATE_R1 = 1;
localparam STATE_R2 = 2;
localparam STATE_WE = 3;
localparam STATE_WE = 4;
localparam STATE_FE = 5;
localparam STATE_IDE = 6;
localparam STATE_IDE = 6;
localparam STATE_IDE = 5;
reg [2:0] r_state = STATE_IF;
806
807
808
809
810
811
812
813
814
                       reg [DATA_WIDTH-1:0] r_pc = 16'h0000;
reg [DATA_WIDTH-1:0] r_pc_saved = 16'h0000;
reg [DATA_WIDTH-1:0] r_instr = 16'h0000;
wire [DATA_WIDTH-1:0] w_mem_instr_out;
wire w_halt;
815
816
817
818
819
820
821
                       assign dbug = {7'h00, w_halt};
assign halt = w_halt;
822
823
                       wire [4:0] r_instr_opcode;
wire [4:0] r_instr_alu_op;
wire [2:0] r_instr_rad;
wire [2:0] r_instr_rad;
reg [DATA_WIDTH-1:0] r_instr_rad = 0;
reg [DATA_WIDTH-1:0] r_instr_rad = 0;
wire [3:0] r_instr_rad = 0;
wire [3:0] r_instr_rad = 0;
824
825
826
827
828
829
                                                                    r_instr_rda = 0;
r_instr_imm4;
r_instr_imm8;
r_instr_simm5;
r_instr_has_imm4;
r_instr_has_imm8;
r_instr_has_we;
830
                       wire [3:0]
wire [7:0]
wire [4:0]
wire
wire
wire
831
832
832
833
834
835
836
837
                         wire
                                                                      r_instr_has_br;
                         wire
                                                                      r_instr_has_cmp;
838
                         wire
                                                                       r_instr_has_mem;
839
                         wire
                                                                      r_instr_has_mem_we;
                        wire
wire
wire
                                                                     r_instr_halt;
r_instr_has_lwex;
r_instr_has_swex;
840
841
842
843
844
845
                       wire [DATA_WIDTH-1:0] r_alu_out;
                       wire [DATA_WIDTH-1:0] r_mem_scratch_addr = $signed(r_alu_out) + $signed(r_instr_simm5);
vire [DATA_WIDTH-1:0] r_mem_scratch_in = r_instr_rdd;
vire [DATA_WIDTH-1:0] r_mem_scratch_out;
vire r_mem_scratch_ve = r_instr_has_mem_we && (r_state == STATE_ME);
reg r_mem_scratch_req = 0;
vire r_mem_scratch_busy;
846
847
848
849
850
851
852
                       853
854
855
856
857
858
859
860
                        // branching
861
                       wire w_intr;
wire w_branch,en;
wire w_branching = r_instr_has_br && w_branch_en;
reg [3:0] r_cmp_flags = 4'h00; // N, Z, C, V
862
863
864
865
866
867
868
                       869
                       // 2 cycle register fetch
always @(*) begin
   r_reg_rs1 = 0;
   if (r_state == STATE_R1)
        r_reg_rs1 = r_instr_rsd;
else if (r_state == STATE_R2)
        r_reg_rs1 = r_instr_rsa;
870
871
872
873
874
875
876
877
                               else
878
                                        r_reg_rs1 = 3'h0;
879
880
881
                       reg regs_use_int = 0;
'ifdef DEF_ENABLE_INT
wire ['DEF_NUM_INT* DATA_WIDTH-1:0] ints_vector;
wire ['DEF_NUM_INT-1:0] ints_mask;
882
883
884
                                                                                         ints_mask;
has_int = ints & ints_mask;
885
                         wire
886
887
888
889
                       reg int_pending = 0;
reg int_pending_ack = 0;
always @(posedge clk)
    if (int_pending_ack)
                                         // We've now branched to the isr
890
891
                                         int_pending <= 0;
892
                                else if (has_int)
893
                                        // Notify fsm\ to\ switch\ to\ the\ ints\_vector\ at\ the\ last\ stage
```

```
int_pending <= 1;
else if (w_intr)
    // Return to Interrupt instruction called,
    // so we've finished with the interrupt
    int_pending <= 0;</pre>
 894
895
896
897
898
899
900
                       // Next program counter logic
reg ['DATA_WIDTH-1:0] next_pc = 0;
always @(posedge clk)
    if (reset)
        r_pc <= 0;
else if (r_state == STATE_WB) begin
    ifdef DEF_ENABLE_INT
    if (int neminal begin)</pre>
 901
902
903
904
905
906
907
                                      908
 909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
                                      925
926
927
928
929
930
931
 932
933
934
935
936
937
938
                                       int_pending_ack <= 0;
'endif
end else if (r_pc < (MEM_INSTR_DEPTH-1)) begin
// normal increment
                                               // normal inclosed
// pc <= pc + 1
r pc <= r_pc + 1;
 939
 940
                                               `ifdef DEF_ENABLE_INT
   int_pending_ack <= 0;
`endif</pre>
 941
942
943
944
945
                               end1r
end
end // end r_state == STATE_WB
else if (r_state == STATE_HALT) begin
                                      946
 947
948
 949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
                                               int_pending_ack <= 0;
                                       end
`endif
                `ifndef DEF_CORE_HAS_INSTR_MEM
                       initial w2_PSELx = 0;
initial w2_PENABLE = 0;
initial w2_PADDR = 0;
 969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
                `endif
                       // cpu state machine
always @(posedge clk)
if (reset) begin
                                       r_state
                                                                           <= STATE_IF;
                                       r_instr <= 0;
r_mem_scratch_req <= 0;
r_instr_rdd <= 0;
r_instr_rda <= 0;
                               end
else begin
               985
 986
987
988
989
990
991
992
993
994
995
996
997
998
999
                                                       %display($time, "\tC%O2h: PC: %h", CORE_ID, r_pc);
$display($time, "\tC%O2h: INSTR: %h", CORE_ID, w_mem_instr_out);
                                                       r_state <= STATE_R1;
                                       end
               `else
                                       // wait for global instruction rom to give us our instruction
if (r_state == STATE_IF) begin
   // wait for ready signal
if (102_PREADY) begin
                                                        w2_PSELx <= 1;
w2_PWRITE <= 0;
1000
1000
1001
1002
1003
1004
1005
1006
                                               w2_PWRITE <= 0;

w2_PENABLE <= 1;

w2_PWDATA <= 0;

w2_PADDR <= r_pc;

end else begin

w2_PSELx <= 0;

w2_PWRITE <= 0;

w2_PENABLE <= 0;
1007
1008
1009
                                                        w2_PWDATA <= 0;
```

```
1010
1011
1012
1013
1014
                                                          r_instr <= w2_PRDATA;
                                                          $display("");
$display($time, "\tc%02h: PC: %h", CORE_ID, r_pc);
$display($time, "\tc%02h: INSTR: %h", CORE_ID, w2_PRDATA);
 1015
1016
1017
                                                          r_state <= STATE_R1;
1018
1019
1020
1021
1022
                 endif
                                          else if (r_state == STATE_R1) begin
                                                1023
1024
1025
1026
1027
1028
1029
 1030
1031
 1032
 1033
                                          1033
1034
1035
1036
1037
1038
 1039
                                                 if (r_instr_has_mem) begin
    r_state <= STATE_ME;</pre>
 1040
                                                 if (r_instr_has_mem) begin
    r_state <= ST
    // Pulse req
    r_mem_scratch_req <= 1;
end else
    r_state <= STATE_WB;</pre>
1041
1042
1043
1044
1045
1046
                                          else if (r_state == STATE_ME) begin
1047
                                                  ## (r_state == STATE_ME)
// Pulse req
r_mem_scratch_req <= 0;
// Wait for MMU to finish
if (!r_mem_scratch_busy)
    r_state <= STATE_WB;</pre>
1048
1049
1050
1051
1052
1053
                                          else if (r_state == STATE_WB) begin
 1054
                                                  if (r_instr_has_cmp) begin
    $display($time, "\tc%Ozh: CMP: %h", CORE_ID, r_alu_out[3:0]);
    r_cmp_flags <= r_alu_out[3:0];
end</pre>
1055
1056
1056
1057
1058
1059
1060
1061
                           sTATE_FE;

else if (r_state == STATE_FE)
    r_state <= STATE_IF;

else if (r_state == STATE_HALT) begin
    'ifdef DEF_ENABLE_INT
    if (int_pending) begin
        r_state <= STATE_FE;
    end
    'endif
end
1062
1063
 1064
1064
1065
1066
1067
1068
 1069
 1070
1071
1071
1072
1073
1074
1075
1076
                'ifdef DEF_CORE_HAS_INSTR_MEM

// Instruction ROM
(* rom_style = "distributed" *)
vmicro16_bram # (
.MEM_WIDTH (DATA_WIDTH
                                                                  (DATA_WIDTH),
 1077
                                                                   (MEM_INSTR_DEPTH),
1078
                                  .MEM_DEPTH
 1079
                        .CORE_ID
.USE_INITS
.NAME
) mem_instr (
.clk
.reset
// port 1
.mem_addr
.mem_in
.mem_ue
.mem_out
);
                                   .CORE ID
                                                                   (CORE_ID),
1079
1080
1081
1082
1083
1084
                                                                   (1),
("INSTR_MEM")
                                                                  (clk),
(reset),
1085
                                                                   (r_pc),
 1086
1086
1087
1088
1089
1090
1091
                                                                   (r_pc),
(0),
(1'b0), // ROM
(w_mem_instr_out)
 1092
                         // MMU
1093
1094
                          vmicro16 core mmu
 1095
                                   .MEM WIDTH
                                                                   (DATA WIDTH).
1095
1096
1097
1098
1099
1100
                                  .MEM_DEPTH
.CORE_ID
                                                                   (MEM_SCRATCH_DEPTH),
(CORE_ID)
                         ) mmu (
                                                                   (clk),
                                                                   (reset),
                                 .reset
                                 .reset
.req
.busy
// interrupts
.ints_wector
.ints_mask
// port 1
.mmu_addr
.mmu_in
1101
                                                                  (r_mem_scratch_req),
(r_mem_scratch_busy),
 1102
1103
1104
1105
1106
                                                                   (ints_vector),
(ints_mask),
                                                                  (r_mem_scratch_addr),
(r_mem_scratch_in),
(r_mem_scratch_we),
(r_instr_has_lwex),
(r_instr_has_swex),
(r_mem_scratch_out),
r to slave
(w_PADDR),
(w_PWRITE).
 1107
1108
1109
                                   .mmu_we
 1110
                                   .mmu_lwex
1110
1111
1112
1113
1114
1115
                                 .mmu_lwex
.mmu_swex
.mmu_out
// APB maste
.M_PADDR
.M_PWRITE
                                                                   (w_PWRITE),
1116
                                   .M_PSELx
                                                                   (w_PSELx),
(w_PENABLE),
                                   M PENABLE
 1117
                                  .M_PWDATA
.M_PRDATA
.M_PREADY
                                                                   (w_PENABLE)
(w_PWDATA),
(w_PRDATA),
(w_PREADY)
 1118
1119
1120
1121
1122
1123
1124
                         // Instruction decoder
                          vmicro16_dec dec (
1125
                                 // input
```

```
.instr
// output async
.opcode
.rd
.ra
.imm4
.imm8
1126
1127
1128
1129
1130
1131
                                                                                       (r_instr),
                                                                                       (r_instr_iss),
(r_instr_imm4),
(r_instr_imm8),
(),
(r_instr_imm8),
(r_instr_alu_op),
(r_instr_has_imm8),
(r_instr_has_imm4),
(r_instr_has_imm8),
(r_instr_has_imm8),
(r_instr_has_pr),
(r_instr_has_pr),
(r_instr_has_pr),
(r_instr_has_mem_we),
(w_intr),
(w_intr),
(r_instr_has_mem_we),
(w_intr),
(r_instr_has_mem_we),
(r_instr_has_mem_we),
(v_instr_has_mem_we),
(v_instr_has_mem_we),
(r_instr_has_swex)
1133
                                             .imm12
1134
                                               .simm5
                                             .simm5
.alu_op
.has_imm4
.has_imm8
.has_we
.has_br
1134
1135
1136
1137
1138
1139
                                             .nas_or
.has_cmp
.has_mem
.has_mem_we
.halt
.intr
.has_lwex
1140
1141
1141
1142
1143
1144
1145
1146
1147
                                            .has_swex
1148
1149
1150
1151
1152
1153
1154
                                 // Software registers
vmicro16_regs # (
    .CORE_ID (CORE_ID),
    .CELL_WIDTH (`DATA_WIDTH)
                               ) regs (
.clk (clk),
.reset (reset),
// async port 0
-1 (r_reg_ra
1156
1157
1158
1159
1160
1161
                                          1162
1163
                                                                             (r_reg_we && ~regs_use_int),
(r_instr_rsd),
(r_reg_wd)
                                            .we
.ws1
.wd
1164
1165
1166
1167
1168
1169
1170
                               1171
1172
 1173
1173
1174
1175
1176
1177
1178
1179
 1180
1180
1181
1182
1183
1184
1185
\frac{1186}{1187}
                                 );
`endif
1188
1189
1190
1191
1192
1193
                                 1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
                                 );
                                 branch branch_check (
                                    flags
.cond
.en
                                                                             (r_cmp_flags),
(r_instr_imm8),
(w_branch_en)
                      endmodule
```

#### vmicro16\_soc.v

```
else
  if (hold)
    hold <= hold - 1;</pre>
  // Vmicro16 multi-core SoC with various peripherals
                           // and interrupts
                            module vmicro16_soc (
                                          // UARTO
                                         input
output
//
                                                                                                                                                                uart_tx,
                                          //
output [`APB_GPI00_PINS-1:0]
output ['APB_GPI01_PINS-1:0]
output [`APB_GPI02_PINS-1:0]
//
                                                                                                                                                                 gpio0,
                                          output
//
                                                                              [`CORES-1:0]
[`CORES*8-1:0]
                                           output
                                                                                                                                                                 dbug0.
                                         output
                                                                                                                                                                dbug1
                          );
                                         wire [`CORES-1:0] w_halt;
assign halt = &w_halt;
                                         assign dbug0 = w_halt;
                                        // Watchdog reset pulse signal.
// Passed to pow_reset to generate a longer reset pulse
wire wdreset;
wire prog_prog;
// Set high if a bus stall or error occurs.
// This will reset the whole SoC!
                                         wire bus_reset;
                                          // soft register reset hold for brams and registers
                                         // soft register reset h
wire soft_reset;
`ifdef DEF_GLOBAL_RESET
pow_reset # (
    .INIT (1),
    .N (8)
) por_inst (
                                                                   .reset
                                                                                                               (reset),
                                                                       .resethold (soft_reset)
                                         'else
assign soft_reset = 0;
'endif
                                         // Peripherals (master to slave)
                                             // Pertpherals (master to slave)
wire ['APB_WIDTH-1:0] M_PADDR;
wire M_PWRITE;
wire ['SLAVES-1:0] M_PENAILE;
wire M_PENAILE;
wire ['DATA_WIDTH-1:0] M_PENAILA;
wire ['SLAVES*'DATA_WIDTH-1:0] M_PENAILA;
wire ['SLAVES*'DATA_WIDTH-1:0] M_PENAILA;
wire ['SLAVES*'DATA_WIDTH-1:0] M_PENAILA;
wire ['SLAVES-1:0] M_
                                                                                                                                                                M_PREADY; // input
                                             wire [`SLAVES-1:0]
                                       w_PENABLE;
 100
 101
102
103
104
105
106
107
                        // Interrupts

'ifdef DEF_ENABLE_INT

wire ['DEF_NUM_INT-1:0] ints;

wire ['DEF_NUM_INT* DATA_WIDTH-1:0] ints_data;

assign ints[7:1] = 0;

assign ints_data['DEF_NUM_INT* DATA_WIDTH-1: DATA_WIDTH] =

{'DEF_NUM_INT*('DATA_WIDTH-1){1'bo}};
108
109
 110
                                      dif

apb_intercon_s # (
    .MASTER_PORTS ( CORES),
    .SLAVE_PORTS ( SLAVES),
    .BUS_WIDTH ( APB_WIDTH),
    . "ITDTH ( DATA_WIDTH),
    . "(1)
111
112
113
114
115
116
117
 118
 119
120
121
                                         ) apb (
                                                                                                     (clk),
(soft_reset),
                                                         .reset (soft_reset
// APB master to slave
.S_PADDR (w_PADDR),
122
123
124
                                                          .S_PWRITE
                                                                                                      (w_PWRITE),
                                                                                                   (w_PWRITE),
(w_PSELx),
(w_PENABLE),
(w_PWDATA),
(w_PRDATA),
(w_PREADY),
125
126
127
128
129
130
                                                            S PSELx
                                                        S_PSELX (W_PSELX),
S_PENABLE (W_PENABLE),
S_PWDATA (W_PWDATA),
S_PRDATA (W_PRDATA),
S_PREADY (W_PREADY)
// shared bus
.M_PADDR (M_PADDR),
 131
                                                          .M_PADDR
.M_PWRITE
.M_PSELx
.M_PENABLE
.M_PWDATA
.M_PRDATA
 132
                                                                                                     (M PWRITE).
                                                                                                (M_PWRITE),

(M_PSELx),

(M_PENABLE),

(M_PWDATA),

(M_PRDATA),

(M_PREADY)
133
134
135
136
137
                                                          .M_PREADY
138
140
                         `ifdef DEF_USE_BUS_RESET
```

```
141
142
143
144
145
146
147
                                              cro16_psel_er
.clk
.reset
// apb slave
.S_PADDR
.S_PWRITE
.S_PSELx
                                                                                     (M_PSELx[`APB_PSELX_PERRO]),
                                              .s_PSELX (M_PSELX_YPB_PSELX_PERRO]),
.S_PEMABLE (M_PEMABLE),
.S_PMDATA (),
.S_PRDATA (),
.S_PREADY (M_PREADY[^APB_PSELX_PERRO]),
// Error interrupt to reset the bus
.err_i (bus_reset)
148
149
150
151
152
153
154
155
                       `endif
156
157
158
159
160
161
162
163
164
                      'ifdef DEF_USE_WATCHDOG

vmicro16_watchdog_apb # (
    .BUS_WIDTH ('APB_WIDTH),
    .NAME ("WDOGO")
                                  .marit
) wdog0_apb (
.clk
                                               .clk (clk),
.reset (),
// apb slave to master interface
                                                // apb slave to master interface
.S_PADDR (),
.S_PWRITE (M_PWRITE),
.S_PSELX (M_PSELX_WDOGG)),
.S_PENABLE (M_PENABLE),
.S_PWADTA (),
.DROATA (),
165
166
167
168
169
 170
                                                 .S_PRDATA
                                                                                    (),
                                                .S_PREADY (M_PREADY[`APB_PSELX_WDOGO]),
 171
172
173
174
175
176
177
                                               .wdreset (wdreset)
                                vmicro16_gpio_apb # (
    .BUS_WIDTH ( 'APB_WIDTH),
    .DATA_WIDTH ( 'DATA_WIDTH),
    .PORTS ( 'APB_GPIOO_PINS),
    .NAME ( "GPIOO")
) gpioo_apb (
    .clk (clk),
    .reset (soft_reset),
    // apb slave to master interface
    .S_PADDR (M_PADDR),
    .S_PWANTE (M_PWHITE),
178
179
180
181
182
183
184
185
186
187
                                              .S.PADDR (M.PADDR),
S.PWRITE (M.PWRITE),
S.PSELX (M.PSELX_'APB_PSELX_GPI00]),
S.PENABLE (M.PENABLE),
S.PENABLE (M.PENABLE),
S.PDADTA (M.PUDATA),
S.PENATA (M.PENATA', APB_PSELX_GPI00*DATA_WIDTH +: 'DATA_WIDTH]),
S.PERADY (M.PERADY['APB_PSELX_GPI00]),
 188
189
190
191
192
193
                                 .gpio
);
                                                                                    (gpio0)
194
195
                                  // GPIO1 for Seven segment displays (16 pin)
vmicro16_gpio_apb # (
    .BUS_WIDTH ('APB_WIDTH),
    .DATA_WIDTH ('ADTA_WIDTH),
    .PORTS ('APB_GPIO1_PINS),
    .NAME ("GPIO1")
195
196
197
198
199
                     .DAIA_
.PORTS
.NAME ("GPTG1.)
) gpio1_apb (
.clk (clk),
.reset (soft_reset),
// apb slave to master interface
.S_PADDR (M_PADDR),
.S_PWRITE (M_PWRITE),
.S_PSELX (M_PSELX_TAB_PSELX_GPIO1]),
.S_PENABLE (M_PENABLE),
.S_PENABLE (M_PENABLE),
.S_PENABLE (M_PENATLE),
.S_PENABLE (M_PENABLE),
.S_PENABLE (M_PENABLE),
.S_PENABLE (M_PENABLE),
.S_PENABLY (M_PENADT(TAB_PSELX_GPIO1*)DATA_WIDTH +: `DATA_WIDTH]),
.S_PREADY (M_PREADT(TAB_PSELX_GPIO1)),
.gpio (gpio1)

~ont displays (8 pin)
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
                                 // GPIO2 for Seven segment displays (8 pin)
vmicro16.gpio.apb # (
.BUS_WIDTH ('APB_WIDTH),
.DATA_WIDTH ('DATA_WIDTH),
.PORTS ('APB_GPIO2_PINS),
.NAME ("GPIO2")
) gpio2_apb (
.clk (clk).
216
217
218
219
220
221
222
223
224
225
226
                                                .clk
                                                .reset
                                                                                     (soft_reset),
                                              .reset (soft_reset),
// apb slave to master interface
S.PADDR (M.PADDR),
S.PWRITE (M.PWRITE),
S.PSELX (M.PSELX['APB_PSELX_(
S.PENABLE (M.PENABLE),
S.PENABLE (M.PENABLE),
                                                                                   e to master interface ((M_PADDR), (M_PWRITE), (M_PWRITE), (M_PSELX_GPI02]), (M_PENABLE), (M_PENABLE), (M_PENABLE), (M_PMDATA), (M_PMDATA), (M_PMDATA), (M_PRADY['APB_PSELX_GPI02*'DATA_WIDTH +: 'DATA_WIDTH]), (M_PREADY['APB_PSELX_GPI02]), (m_inc?)
227
228
229
230
231
                                                .S_PWDATA
                                                .S_PRDATA
232
233
                                                .S PREADY
                                                .gpio
                                                                                     (gpio2)
234
235
236
237
238
239
240
                                  );
                                  apb_uart_tx # (
    .DATA_WIDTH (8),
    .ADDR_EXP (4) //2^4 = 16 FIFO words
) uart0_apb (
                                                                                    (clk),
                                              .clk
                                              .clk (clk),
reset (soft_reset),
// apb slave to master interface
.S_PADDR (M_PADDR),
.S_PWRITE (M_PWRITE),
.S_PSELX (M_PSELX_VARTO]),
.S_PENABLE (M_PENABLE),
.S_PENABLE (M_PENABLE),
241
242
243
244
245
246
247
248
                                                 S_PRDATA (M_PWDATA),
S_PRDATA (M_PRDATA['APB_PSELX_UARTO*`DATA_WIDTH +: `DATA_WIDTH]),
                                              .S_PREADY (
// uart wires
.tx_wire (
.rx_wire (
249
250
251
252
253
254
255
                                                                                  (M_PREADY['APB_PSELX_UARTO]),
                                  );
                                  timer_apb timr0 (
    .clk (clk),
```

```
.reset (soft_reset),
// apb slave to master interface
.S_PADDR (M_PADDR),
.S_PWRITE (M_PWRITE),
.S_PSELX (M_PSELX_TIMRO]),
.S_PFDABLE (M_PENABLE),
.S_PFDATA (M_PMDATA),
.S_PRDATA (M_PRDATA[^APB_PSELX_TIMRO*`DATA_WIDTH +: `DATA_WIDTH]),
.S_PRDATA (M_PRDATA[^APB_PSELX_TIMRO])
///
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
                                //
'ifdef DEF_ENABLE_INT
,.out (ints ['DEF_INT_TIMRO]'),
.int_data (ints_data['DEF_INT_TIMRO*'DATA_WIDTH +: 'DATA_WIDTH]')
                                `endif
                       // Shared register set for system-on-chip info
// RO = number of cores
vmicro16_regs_apb # (
.BUS_WIDTH ( 'APB_WIDTH),
.DATA_WIDTH ( 'DATA_WIDTH),
273
274
275
276
277
278
279
280
                                  .CELL_DEPTH (8),
.PARAM_DEFAULTS_R0 (`CORES),
.PARAM_DEFAULTS_R1 (`SLAVES)
                    281
282
283
284
285
286
287
                                 .S_FORLX (M_FORLX APD_FORLX,ADUSUJ),
S_PENBALE (M_PENBALE),
S_PUDATA (M_PDATA),
S_PROATA (M_PDATA[^APB_PSELX_REGSO*`DATA_WIDTH +: `DATA_WIDTH]),
S_PREADY (M_PREADY[^APB_PSELX_REGSO])
288
289
290
291
292
293
294
                       295
296
297
298
299
300
301
                                .CIK (CIK),
.reset (soft_reset),
// apb slave to master interface
.S_PADDR (M_PADDR),
302
                                 .S_PADDR
.S_PWRITE
.S_PSELx
.S_PENABLE
.S_PWDATA
303
                                                      (M_PADDR),
(M_PSELx[^APB_PSELX_BRAMO]),
(M_PSELx[^APB_PSELX_BRAMO]),
(M_PBDATA),
(M_PDDATA),
(M_PRDATA[^APB_PSELX_BRAMO*^DATA_WIDTH +: `DATA_WIDTH]),
(M_PREADY[^APB_PSELX_BRAMO])
304
305
306
307
308
309
                                  .S_PRDATA
                                 .S_PREADY
310
311
312
313
314
315
                       // There must be atleast 1 core
`static_assert(`CORES > 0)
`static_assert(`DEF_MEM_INSTR_DEPTH > 0)
`static_assert(`DEF_MMU_TIMO_CELLS > 0)
316
317
318
                         // Single instruction memory
                'ifndef DEF_CORE_HAS_INSTR_MEM

// slave input/outputs from interconnect
wire ['APB_WIDTH-1:0] instr_M_F
319
320
321
322
323
324
325
326
327
328
329
330
331
                                                                                         instr_M_PADDR;
instr_M_PWRITE;
instr_M_PSELx; /
instr_M_PENABLE;
                         wire
wire [1-1:0]
                                                                                                                           // not shared
                        wire wire ['DATA_WIDTH-1:0]
                                                                                          instr_M_PWDATA;
                                                                                         instr_M_PRDATA; // slave response
instr_M_PREADY; // slave response
                         wire [1*`DATA_WIDTH-1:0]
                        wire [1-1:0]
                       // Master apb interfaces
wire [`CORES*`APB_WIDTH-1:0]
wire [`CORES-1:0]
                                                                                         instr_w_PADDR;
                                                                                          instr_w_PWRITE;
instr_w_PSELx;
332
                        wire ['CORES-1:0]
                        wire [CORES-1:0] INSUT_W_PSELX;
wire [CORES-1:0] instr_w_PEMABLE;
wire [CORES-10AL_WIDTH-1:0] instr_w_PBMATA;
wire [CORES-10] instr_w_PRADATA;
 333
334
335
336
337
338
339
340
                       `ifdef DEF_USE_REPROG
  wire ['clog2('DEF_MEM_INSTR_DEPTH)-1:0] prog_addr;
  wire ['DATA_WIDTH-1:0] prog_data;
                                341
342
343
344
345
346
347
                                                                 (prog_addr),
(prog_data),
(prog_we),
(prog_prog)
348
349
                                          .data
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
                         `ifdef DEF_USE_REPROG
                                 vmicro16_bram_prog_apb
                         `else
                                 vmicro16_bram_apb
                       `endif
# (
                                                                  (`APB_WIDTH),
                                  .MEM_WIDTH
                                                                  ('DATA_WIDTH)
                                                                  ('DEF_MEM_INSTR_DEPTH),
                                  .MEM_DEPTH
                                  USE INITS
                                                                  ("INSTR_ROM_G")
                                  NAME.
                                  .reset
.S_PADDR
                                                                  (reset),
(instr_M_PADDR),
                                  .S_PWRITE
                                                                  (0),
                                                                  (instr_M_PSELx),
                                  .S_PSELx
                                  .S_PENABLE
                                                                  (instr_M_PENABLE),
```

```
.S_PWDATA
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
                                                                     (instr_M_PRDATA),
(instr_M_PREADY)
                                  `ifdef DEF_USE_REPROG
                                          ,
.addr
                                                                   (prog_addr),
                                                                   (prog_data),
(prog_we),
(prog_prog)
                                           .data
                                .we
.prog
`endif
                        apb_intercon_s # (
                                                                    ('CORES).
                                  .MASTER PORTS
                        .MASTER_PORTS (CORES),
.SLAVE_PORTS (1),
.BUS_WIDTH ('APB_WIDTH),
.DATA_WIDTH ('DATA_WIDTH),
.HAS_PSELX_ADDR (0)
) apb_instr_intercon (
389
390
391
392
393
394
395
396
397
398
399
400
401
                                                       (clk),
                                  .clk
                                  .reset (soft_reset)
// APB master from cores
                                                            (soft reset).
                                  // master
                                  .S_PADDR
.S_PWRITE
.S_PSELx
.S_PENABLE
                                                          (instr_w_PADDR),
(instr_w_PWRITE),
(instr_w_PSELx),
(instr_w_PENABLE),
                                   .S_PWDATA
                                                            (instr_w_PWDATA),
402
403
404
405
406
407
408
409
410
                                   .S_PRDATA
                                                            (instr_w_PRDATA),
(instr_w_PREADY),
                                   .S_PREADY
                                 .S_PREADY (instr_w_PREADY),
// shared bus slaves
// slave outputs
M_PADDR (instr_M_PADDR),
.M_PWRITE (instr_M_PRELX),
.M_PSELX (instr_M_PSELX),
M_PDRIABLE (instr_M_PEDMALE),
M_PUDATA (instr_M_PUDATA),
M_PDRIABLE (instr_M_PUDATA),
M_PDRIATA (instr_M_PUDATA),
M_PDRIATA (instr_M_PUDATA),
                                   .M_PRDATA (instr_M_PRDATA),
.M_PREADY (instr_M_PREADY)
411
412
413
414
415
416
417
                         genvar i;
generate for(i = 0; i < `CORES; i = i + 1) begin : cores</pre>
418
419
                                  vmicro16 core # (
                                          .CORE_ID
.DATA_WIDTH
420
421
422
423
424
425
                                                                                      (`DATA_WIDTH),
                                           .MEM_INSTR_DEPTH ('DEF_MEM_INSTR_DEPTH),
.MEM_SCRATCH_DEPTH ('DEF_MMU_TIMO_CELLS)
                                 ) c1 (
                                          .clk
                                                                    (clk).
426
427
428
429
430
431
432
433
434
                                          .reset
                                                                     (soft_reset),
                                          // debug
.halt
                                                                    (w_halt[i]),
                                          // interrupts
                                                                    (ints),
                                           .ints
                                          .ints_data (ints_data),
435
436
437
438
439
440
                                         // Output master port 1
.w_PADDR (w_PADDR [`APB_WIDTH*i +: `APB_WIDTH] ),
.w_PWRITE (w_PWRITE [i] ),
.w_PSELX (w_PSELX [i] ),
.w_PENABLE (w_PENABLE [i] ),
.w_PDDATA (w_PENABLE [i] ),
.w_PUDATA (w_PROATA [`DATA_WIDTH*i +: `DATA_WIDTH]),
.w_PREADY (w_PREADY [i] )
441
442
443
444
445
446
447
                `ifndef DEF_CORE_HAS_INSTR_MEM
                                         _CORE_HAS_INSTR_MEM

// APB instruction rom

// Output master port 2
.w2_PADDR (instr_w_PADDR ['APB_wIDTH*i +: 'APB_wIDTH]),

//.w2_PWRITE (instr_w_PWRITE [i] ),
.w2_PSELx (instr_w_PSELx [i] ),
.w2_PENBLE (instr_w_PENBLE [i] ),
//.w2_PWDATA (instr_w_PWDATA ['DATA_wIDTH*i +: 'DATA_wIDTH]),
.w2_PRDATA (instr_w_PWDATA ['DATA_wIDTH*i +: 'DATA_wIDTH]),
.w2_PRDATA (instr_w_PREADY [i] )
448
449
450
451
452
453
454
455
456
457
458
                 endif;
                         endgenerate
459
460
461
462
463
464
465
                         // Formal Verification
                           ifdef FORMAL
                        466
467
468
469
470
471
472
473
474
475
476
477
480
481
482
483
484
485
                         integer i2;
initial
                                ttial
for(i2 = 0; i2 < `CORES; i2 = i2 + 1) begin
bus_core_times[i2] = 0;
core_work_times[i2] = 0;
end</pre>
                         // total bus time
                       // total bus time
generate
generate
generat?;
for (g2 = 0; g2 < 'CORES; g2 = g2 + 1) begin : formal_for_times
always @(posedge clk) begin
if (w_PSELx[g2])
bus_core_times[g2] <= bus_core_times[g2] + 1;</pre>
 486
487
488
                                                            // Core working time
```

```
`ifndef DEF_CORE_HAS_INSTR_MEM
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
511
512
513
514
515
515
516
517
                                                                         if (!w_PSELx[g2] && !instr_w_PSELx[g2])
                                                             if (!w_PSELx[g2])

else
    if (!w_PSELx[g2])

endif
                                                                                   if (!w_halt[g2])
                                                                                                core_work_times[g2] <= core_work_times[g2] + 1;</pre>
                        reg [15:0] bus_time_average = 0;
reg [15:0] bus_reqs_average = 0;
reg [15:0] fetch_time_average = 0;
reg [15:0] work_time_average = 0;
//
                         //
always @(all_halted) begin
for (12 = 0; 12 < `CORES; i2 = i2 + 1) begin
bus_time_average = bus_time_average + bus_core_times[i2];
bus_reqs_average = bus_reqs_average + bus_core_reqs_count[i2];
work_time_average = work_time_average + core_work_times[i2];
fetch_time_average = fetch_time_average + instr_fetch_times[i2];
                                  bus_time_average = bus_time_average / CORES;
bus_reqs_average = bus_reqs_average / CORES;
vork_time_average = vork_time_average / CORES;
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
                         /// Count number of bus requests per core
//// Clock delay of w_PSELx
reg ['CORES-1:0] bus_core_reqs_last;
// rising edges of each
wire ['CORES-1:0] bus_core_reqs_real;
                         wire ['CURES-1:0] bus_core_reqs_real;
// storage for counters for each core
reg [15:0] bus_core_reqs_count [0: CORES-1];
initial
    for(i2 = 0; i2 < 'CORES; i2 = i2 + 1)
    bus_core_reqs_count[i2] = 0;</pre>
                         // 1 clk delay to detect rising edge
always @(posedge clk)
  bus_core_reqs_last <= w_PSELx;</pre>
534
535
536
537
538
539
540
541
                                 genvar g3;
    for (g3 = 0; g3 < `CORES; g3 = g3 + 1) begin : formal_for_reqs
    // Detect new reqs for each core
    assign bus_core_reqs_real[g3] = w_PSELx[g3] >
         bus_core_reqs_last[g5]
                                                                                                                                              bus_core_reqs_last[g3];
542
543
544
545
546
547
548
549
                                                always @(posedge clk)
    if (bus_core_reqs_real[g3])
        bus_core_reqs_count[g3] <= bus_core_reqs_count[g3] + 1;</pre>
                         end
endgenerate
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
567
571
572
573
574
575
576
576
                         // from global memory
                                  integer i3;
initial
                                           tial
for(i3 = 0; i3 < `CORES; i3 = i3 + 1)
    instr_fetch_times[i3] = 0;</pre>
                                  // total bus time
// Instruction fetches occur on the w2 master port
                                endgenerate
endif
                         endif // end FORMAL
```

#### vmicro16\_isa.v

```
define VMICRO16_OP_ARITH_USUB
define VMICRO16_OP_ARITH_UADDI
define VMICRO16_OP_ARITH_S
define VMICRO16_OP_ARITH_S
define VMICRO16_OP_ARITH_SSUBI
define VMICRO16_OP_ARITH_SSUBI
define VMICRO16_OP_ARITH_SSUBI
define VMICRO16_OP_CMP
define VMICRO16_OP_CMP
define VMICRO16_OP_SETC
define VMICRO16_OP_MULT
define VMICRO16_OP_MULT
define VMICRO16_OP_LWEX
define VMICRO16_OP_SWEX
                                                                                                                                                                                                                                                                                                                                                                                                                                                          5'Ъ10000
19
20
21
22
23
24
25
26
27
                                                                                                                                                                                                                                                                                                                                                                                                                                                       5'b10000
5'b0????
5'b00111
5'b11111
5'b10000
5'b0????
                                                                                                                                                                                                                                                                                                                                                                                                                                                             5'601000
                                                                                                                                                                                                                                                                                                                                                                                                                                                             5'b01001
5'b01010
                                                                                                                                                                                                                                                                                                                                                                                                                                                       5'b01010
5'b01011
5'b01101
5'b01110
// Special opcodes
'define VMICRO16_OP_SPCL_NOP
'define VMICRO16_OP_SPCL_HALT
'define VMICRO16_OP_SPCL_INTR
                                                                                                                                                                                                                                                                                                                                                                                                                                                          11'h000
                                                                                                                                                                                                                                                                                                                                                                                                                                                          11'h001
11'h002
                                                                // TODO: wasted upper nibble bi

'define VMICRDIG_UP_BR_U

'define VMICRDIG_UP_BR_E

'define VMICRDIG_UP_BR_NE

'define VMICRDIG_UP_BR_G

'define VMICRDIG_UP_BR_GE

'define VMICRDIG_UP_BR_LE

'define VMICRDIG_UP_BR_S

'define VMICRDIG_UP_BR_S

'define VMICRDIG_UP_BR_NS
                                                                                                                                                                                                                                                                                                                                                                                                                                                       8'h00
8'h01
8'h02
8'h03
                                                                                                                                                                                                                                                                                                                                                                                                                                                             8'h04
                                                                                                                                                                                                                                                                                                                                                                                                                                                       8'h05
8'h06
8'h07
8'h08
                                                                // flag bit positions
`define VMICRO16_SFLAG_N
`define VMICRO16_SFLAG_Z
`define VMICRO16_SFLAG_C
`define VMICRO16_SFLAG_V
                                                                                                                                                                                                                                                                                                                                                                                                                                                       4'h03
4'h02
4'h01
4'h00
                                                         define VMICROIG_SFLAG_C 4'NOI

// microcode operations
define VMICROIG_SFLAG_V 5'NOO

define VMICROIG_ALU_BIT_OR 5'NOO

define VMICROIG_ALU_BIT_OR 5'NOO

define VMICROIG_ALU_BIT_XOR 5'NOO

define VMICROIG_ALU_BIT_XOR 5'NOO

define VMICROIG_ALU_BIT_XOR 5'NOO

define VMICROIG_ALU_BIT_NOT 5'NOO

define VMICROIG_ALU_BIT_RSHFT 5'NOO

define VMICROIG_ALU_BIT_RSHFT 5'NOO

define VMICROIG_ALU_BIT_RSHFT 5'NOO

define VMICROIG_ALU_BU_W 5'NOO

define VMICROIG_ALU_BU_S 5'NOO

define VMICROIG_ALU_BN_S 5'NOO

define VMICROIG_ALU_MOV 5'NOO

define VMICROIG_ALU_MOVI_L 5'NOO

define VMICROIG_ALU_MOVI_L 5'NOO

define VMICROIG_ALU_BATHT_MUSUB 5'NOO

define VMICROIG_ALU_BATHT_MUSUB 5'NOO

define VMICROIG_ALU_BATHT_MUSUB 5'NOO

define VMICROIG_ALU_BATHT_SUBD 5'NOO

define VMICROIG_ALU_BATHT_SUBD 5'NOO

define VMICROIG_ALU_BR_U 5'NIO

define VMICROIG_ALU_BR_E 5'NIO

define VMICROIG_ALU_BR_E 5'NIO

define VMICROIG_ALU_BR_E 5'NIO

define VMICROIG_ALU_BR_E 5'NIO

define VMICROIG_ALU_BR_L 5'NIO

define VMICROIG_ALU_BR_N 5'NIO

defin
66
67
68
69
70
71
72
73
74
75
76
77
78
80
81
82
83
84
85
                                                                                   define VMICRO16_ALU_MULT
86
87
                                                                                endif
```

define VMICRO16\_ALU\_BAD

5'h1f