# Multi-core RISC Processor Design and Implementation (Rev. 2.02)

ELEC5881M - Final Report

Ben David Lancaster

Student ID: 201280376

Submitted in accordance with the requirements for the degree of Master of Science (MSc) in Embedded Systems Engineering

Supervisor: Dr. David Cowell Assessor: Mr David Moore

**University of Leeds** 

School of Electrical and Electronic Engineering

August 20, 2019

Word count: 4689

#### **Abstract**

This interim report details the 4-month progress on a project to design, implement, and verify, a multi-core FPGA RISC processor. The project has been split into two stages: firstly to build a functional single-core RISC processor, and then secondly to add multiprocessor principles and functionality to it.

Current multiprocessor and network-on-chip communication methods have been discussed and how they could be included in this multi-core RISC design. To-date, a 16-bit instruction set architecture has been designed featuring common load/store instructions, comparison, and bitwise operations. A single-core processor has been implemented in Verilog and verified using simulations/test benches running various simple software programs.

Future tasks have been planned and will focus on the second stage of the project. Work will start on designing a loosely coupled multiprocessor communication interface and bringing them to the single-core processor.

## **Revision History**

| Date       | Version | Changes                        |
|------------|---------|--------------------------------|
| 10/04/2019 | 2.02    | Update future stages.          |
| 05/04/2019 | 2.01    | Fix processor RTL diagram.     |
| 04/04/2019 | 2.00    | Initial processor RTL diagram. |
| 01/04/2019 | 1.00    | Initial section outline.       |

Document revisions.

## **Declaration of Academic Integrity**

The candidate confirms that the work submitted is his/her own, except where work which has formed part of jointly-authored publications has been included. The contribution of the candidate and the other authors to this work has been explicitly indicated in the report. The candidate confirms that appropriate credit has been given within the report where reference has been made to the work of others.

This copy has been supplied on the understanding that no quotation from the report may be published without proper acknowledgement. The candidate, however, confirms his/her consent to the University of Leeds copying and distributing all or part of this work in any forms and using third parties, who might be outside the University, to monitor breaches of regulations, to verify whether this work contains plagiarised material, and for quality assurance purposes.

The candidate confirms that the details of any mitigating circumstances have been submitted to the Student Support Office at the School of Electronic and Electrical Engineering, at the University of Leeds.

Name: Ben David Lancaster

Date: August 20, 2019

# **Table of Contents**

| 1 | Intr | oductio  | n                                | 10         |
|---|------|----------|----------------------------------|------------|
|   | 1.1  | Why N    | Multi-core?                      | 10         |
|   | 1.2  | Why F    | RISC?                            | 11         |
|   | 1.3  | Why F    | PGA?                             | 11         |
| 2 | Bac  | kground  | d                                | 12         |
|   | 2.1  | Amda     | hl's Law and Parallelism         | 12         |
|   | 2.2  | Loosel   | y and Tightly Coupled Processors | 12         |
|   | 2.3  | Netwo    | ork-on-chip Architectures        | 13         |
| 3 | Proj | ect Ove  | erview                           | 15         |
|   | 3.1  | Project  | Deliverables                     | 15         |
|   |      | 3.1.1    | Core Deliverables (CD)           | 15         |
|   |      | 3.1.2    | Extended Deliverables (ED)       | 16         |
|   | 3.2  | Project  | t Timeline                       | 17         |
|   |      | 3.2.1    | Project Stages                   | 17         |
|   |      | 3.2.2    | Project Stage Detail             | 17         |
|   |      | 3.2.3    | Timeline                         | 19         |
|   | 3.3  | Resour   | rces                             | 19         |
|   |      | 3.3.1    | Hardware Resources               | 20         |
|   |      | 3.3.2    | Software Resources               | 21         |
|   | 3.4  | Legal    | and Ethical Considerations       | 21         |
| 4 | Sing | gle-core | Design                           | <b>2</b> 3 |
|   | 4.1  | Introd   | uction                           | <b>2</b> 3 |
|   | 4.2  | Design   | and Implementation               | <b>2</b> 3 |
|   |      | 4.2.1    | Instruction Set Architecture     | 24         |
|   |      | 4.2.2    | Instruction and Data Memory      | 25         |
|   |      | 4.2.3    | Memory Management Unit           | 25         |
|   |      | 4.2.4    | ALU Design                       | 25         |
|   |      | 4.2.5    | Decoder Design                   | 26         |
|   |      | 4.2.6    | Pipelining                       | 27         |
|   |      | 4.2.7    | Design Optimisations             | 27         |
|   | 13   | Intorre  | unto                             | 20         |

TABLE OF CONTENTS 4

| 5.1 Introduction       33         5.1.1 Comparison of On-chip Buses       35         5.2 Overview       36         5.2.1 Design Considerations       36         5.3 Interfaces       36         5.3.1 Master to Slave Interface       36         5.3.2 Multi-master Support       36         5.4 Further Work       36         6 Memory Mapping       36         6.1 Introduction       36         6.2 Address Decoding       36         6.3 Memory Map       42         7 Peripherals       45         7.1 Special Registers       45         7.2 Watchdog Timer       44         7.3 GPIO Interface       44         7.4 Timer with Interrupt       46         7.5 UART Interface       44         8 Multi-core Communication       48         8.1.1 Design Goals       48         8.1.2 Context Identification       46         8.1.3 Thread Synchronisation       40         8.2 Design Challenges       49         8.2.1 Memory Constraints       49         Analysis & Results       50         References       52                                                                           |    |        | 4.3.1 Overview                    | 28 |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|--------|-----------------------------------|----|
| 4.3.4 Design Improvements       3.         4.4 Verification       3.         5 Interconnect       3.         5.1 Introduction       3.         5.1.1 Comparison of On-chip Buses       3.         5.2 Overview       3.         5.2.1 Design Considerations       3.         5.3 Interfaces       3.         5.3.1 Master to Slave Interface       3.         5.3.2 Multi-master Support       30         5.4 Further Work       30         6 Memory Mapping       3.         6.1 Introduction       3.         6.2 Address Decoding       3.         6.3 Memory Map       4.         7 Peripherals       4.         7.1 Special Registers       4.         7.2 Watchdog Timer       4.         7.3 GPIO Interface       4.         7.4 Timer with Interrupt       4.         7.5 UART Interface       4.         8 Multi-core Communication       4.         8.1.1 Design Goals       4.         8.1.2 Context Identification       4.         8.1.3 Thread Synchronisation       4.         8.2 Design Challenges       4.         8.2.1 Memory Constraints       4.         9 Analysis & Results |    |        | 4.3.2 Hardware Implementation     | 29 |
| 4.4 Verification       3         5 Interconnect       3         5.1 Introduction       3         5.2 Overview       3         5.2.1 Design Considerations       3         5.3 Interfaces       3         5.3.1 Master to Slave Interface       3         5.3.2 Multi-master Support       3         5.4 Further Work       3         6 Memory Mapping       3         6.1 Introduction       3         6.2 Address Decoding       3         6.3 Memory Map       4         7 Peripherals       4         7.1 Special Registers       4         7.2 Watchdog Timer       4         7.3 GPIO Interface       4         7.4 Timer with Interrupt       4         7.5 UART Interface       4         8 Multi-core Communication       4         8.1.1 Design Goals       4         8.1.2 Context Identification       4         8.1.3 Thread Synchronisation       4         8.2.1 Memory Constraints       4         9 Analysis & Results       5         References       5                                                                                                                           |    |        | 4.3.3 Software Interface          | 29 |
| 5 Interconnect       33         5.1.1 Comparison of On-chip Buses       35         5.2 Overview       33         5.2.1 Design Considerations       36         5.3 Interfaces       33         5.3.1 Master to Slave Interface       33         5.3.2 Multi-master Support       36         5.4 Further Work       36         6 Memory Mapping       36         6.1 Introduction       33         6.2 Address Decoding       33         6.3 Memory Map       44         7 Peripherals       44         7.1 Special Registers       44         7.2 Watchdog Timer       45         7.3 GPIO Interface       44         7.4 Timer with Interrupt       44         7.5 UART Interface       44         8 Multi-core Communication       48         8.1.1 Design Goals       48         8.1.2 Context Identification       46         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       46         8.2.1 Memory Constraints       47         9 Analysis & Results       56         References       57                                                                           |    |        | 4.3.4 Design Improvements         | 31 |
| 5.1 Introduction       33         5.1.1 Comparison of On-chip Buses       35         5.2 Overview       36         5.2.1 Design Considerations       36         5.3 Interfaces       36         5.3.1 Master to Slave Interface       36         5.3.2 Multi-master Support       36         5.4 Further Work       36         6 Memory Mapping       36         6.1 Introduction       36         6.2 Address Decoding       36         6.3 Memory Map       42         7 Peripherals       45         7.1 Special Registers       45         7.2 Watchdog Timer       44         7.3 GPIO Interface       44         7.4 Timer with Interrupt       46         7.5 UART Interface       44         8 Multi-core Communication       48         8.1.1 Design Goals       48         8.1.2 Context Identification       46         8.1.3 Thread Synchronisation       40         8.2 Design Challenges       49         8.2.1 Memory Constraints       49         Analysis & Results       50         References       52                                                                           |    | 4.4    | Verification                      | 31 |
| 5.1.1 Comparison of On-chip Buses       33         5.2 Overview       33         5.2.1 Design Considerations       34         5.3 Interfaces       34         5.3.1 Master to Slave Interface       35         5.3.2 Multi-master Support       36         5.4 Further Work       36         6 Memory Mapping       36         6.1 Introduction       36         6.2 Address Decoding       33         6.3 Memory Map       42         7 Peripherals       45         7.1 Special Registers       45         7.2 Watchdog Timer       46         7.3 GPIO Interface       44         7.4 Timer with Interrupt       44         7.5 UART Interface       46         8 Multi-core Communication       48         8.1.1 Design Goals       44         8.1.2 Context Identification       45         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       46         8.2.1 Memory Constraints       47         9 Analysis & Results       56         References       57                                                                                                           | 5  | Inte   | erconnect                         | 32 |
| 5.2 Overview       33         5.2.1 Design Considerations       34         5.3 Interfaces       35         5.3.1 Master to Slave Interface       36         5.3.2 Multi-master Support       36         5.4 Further Work       36         6 Memory Mapping       36         6.1 Introduction       33         6.2 Address Decoding       36         6.3 Memory Map       47         7 Peripherals       47         7.1 Special Registers       42         7.2 Watchdog Timer       42         7.3 GPIO Interface       44         7.4 Timer with Interrupt       44         7.5 UART Interface       44         8 Multi-core Communication       48         8.1.1 Design Goals       48         8.1.2 Context Identification       48         8.1.3 Thread Synchronisation       48         8.2 Design Challenges       48         8.2.1 Memory Constraints       49         Analysis & Results       50         References       50                                                                                                                                                                |    | 5.1    | Introduction                      | 32 |
| 5.2.1 Design Considerations       33         5.3 Interfaces       33         5.3.1 Master to Slave Interface       33         5.3.2 Multi-master Support       36         5.4 Further Work       36         6 Memory Mapping       38         6.1 Introduction       33         6.2 Address Decoding       36         6.3 Memory Map       46         7 Peripherals       47         7.1 Special Registers       42         7.2 Watchdog Timer       45         7.3 GPIO Interface       44         7.4 Timer with Interrupt       44         7.5 UART Interface       44         8 Multi-core Communication       48         8.1.1 Design Goals       48         8.1.2 Context Identification       48         8.1.3 Thread Synchronisation       48         8.2.1 Memory Constraints       48         9 Analysis & Results       56         References       57                                                                                                                                                                                                                                   |    |        | 5.1.1 Comparison of On-chip Buses | 32 |
| 5.3 Interfaces       33         5.3.1 Master to Slave Interface       33         5.3.2 Multi-master Support       36         5.4 Further Work       36         6 Memory Mapping       38         6.1 Introduction       36         6.2 Address Decoding       36         6.3 Memory Map       36         6.3 Memory Map       47         7 Peripherals       47         7.1 Special Registers       42         7.2 Watchdog Timer       45         7.3 GPIO Interface       46         7.4 Timer with Interrupt       47         7.5 UART Interface       48         8 Multi-core Communication       48         8.1 Introduction       48         8.1.1 Design Goals       48         8.1.2 Context Identification       49         8.2 Design Challenges       40         8.2.1 Memory Constraints       40         9 Analysis & Results       50         References       52                                                                                                                                                                                                                     |    | 5.2    | Overview                          | 33 |
| 5.3.1 Master to Slave Interface       33         5.3.2 Multi-master Support       36         5.4 Further Work       36         6 Memory Mapping       38         6.1 Introduction       38         6.2 Address Decoding       38         6.3 Memory Map       42         7 Peripherals       42         7.1 Special Registers       42         7.2 Watchdog Timer       42         7.3 GPIO Interface       42         7.4 Timer with Interrupt       42         7.5 UART Interface       44         8 Multi-core Communication       45         8.1.1 Design Goals       44         8.1.2 Context Identification       45         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       46         8.2.1 Memory Constraints       46         8 References       56                                                                                                                                                                                                                                                                                                             |    |        | 5.2.1 Design Considerations       | 34 |
| 5.3.2 Multi-master Support       36         5.4 Further Work       36         6 Memory Mapping       31         6.1 Introduction       33         6.2 Address Decoding       33         6.2.1 Decoder Optimisations       33         6.3 Memory Map       4         7 Peripherals       45         7.1 Special Registers       45         7.2 Watchdog Timer       45         7.3 GPIO Interface       44         7.4 Timer with Interrupt       44         7.5 UART Interface       45         8 Multi-core Communication       48         8.1.1 Design Goals       48         8.1.2 Context Identification       48         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       48         8.2.1 Memory Constraints       48         8 References       56                                                                                                                                                                                                                                                                                                                  |    | 5.3    | Interfaces                        | 35 |
| 5.4 Further Work       36         6 Memory Mapping       38         6.1 Introduction       38         6.2 Address Decoding       38         6.2.1 Decoder Optimisations       39         6.3 Memory Map       4         7 Peripherals       45         7.1 Special Registers       45         7.2 Watchdog Timer       45         7.3 GPIO Interface       45         7.4 Timer with Interrupt       46         7.5 UART Interface       46         8 Multi-core Communication       48         8.1 Introduction       48         8.1.1 Design Goals       48         8.1.2 Context Identification       48         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       48         8.2.1 Memory Constraints       48         8 References       50                                                                                                                                                                                                                                                                                                                            |    |        | 5.3.1 Master to Slave Interface   | 35 |
| 6 Memory Mapping       36         6.1 Introduction       36         6.2 Address Decoding       38         6.2.1 Decoder Optimisations       39         6.3 Memory Map       42         7 Peripherals       45         7.1 Special Registers       45         7.2 Watchdog Timer       45         7.3 GPIO Interface       46         7.4 Timer with Interrupt       47         7.5 UART Interface       48         8 Multi-core Communication       48         8.1 Introduction       48         8.1.1 Design Goals       48         8.1.2 Context Identification       49         8.1.3 Thread Synchronisation       40         8.2 Design Challenges       48         8.2.1 Memory Constraints       49         9 Analysis & Results       50         References       50                                                                                                                                                                                                                                                                                                                         |    |        | 5.3.2 Multi-master Support        | 36 |
| 6.1 Introduction       33         6.2 Address Decoding       33         6.2.1 Decoder Optimisations       35         6.3 Memory Map       42         7 Peripherals       42         7.1 Special Registers       44         7.2 Watchdog Timer       44         7.3 GPIO Interface       44         7.4 Timer with Interrupt       44         7.5 UART Interface       44         8 Multi-core Communication       48         8.1 Introduction       44         8.1.1 Design Goals       44         8.1.2 Context Identification       45         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       46         8.2.1 Memory Constraints       49         9 Analysis & Results       50         References       50                                                                                                                                                                                                                                                                                                                                                           |    | 5.4    | Further Work                      | 36 |
| 6.2 Address Decoding       33         6.2.1 Decoder Optimisations       33         6.3 Memory Map       42         7 Peripherals       43         7.1 Special Registers       44         7.2 Watchdog Timer       44         7.3 GPIO Interface       44         7.4 Timer with Interrupt       44         7.5 UART Interface       44         8 Multi-core Communication       48         8.1 Introduction       44         8.1.1 Design Goals       44         8.1.2 Context Identification       45         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       46         8.2.1 Memory Constraints       46         8 Analysis & Results       56         References       57                                                                                                                                                                                                                                                                                                                                                                                             | 6  | Mer    | mory Mapping                      | 38 |
| 6.2.1 Decoder Optimisations 6.3 Memory Map                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |    | 6.1    | Introduction                      | 38 |
| 6.3 Memory Map 4  7 Peripherals 43  7.1 Special Registers 44  7.2 Watchdog Timer 44  7.3 GPIO Interface 44  7.5 UART Interrupt 44  7.5 UART Interface 44  8 Multi-core Communication 45  8.1 Introduction 45  8.1.1 Design Goals 45  8.1.2 Context Identification 45  8.1.3 Thread Synchronisation 46  8.2 Design Challenges 46  8.2.1 Memory Constraints 46  8 Analysis & Results 56  References 55                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |    | 6.2    | Address Decoding                  | 38 |
| 7 Peripherals       43         7.1 Special Registers       44         7.2 Watchdog Timer       45         7.3 GPIO Interface       44         7.4 Timer with Interrupt       46         7.5 UART Interface       46         8 Multi-core Communication       48         8.1 Introduction       48         8.1.1 Design Goals       46         8.1.2 Context Identification       46         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       48         8.2.1 Memory Constraints       48         9 Analysis & Results       50         References       52                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |    |        | 6.2.1 Decoder Optimisations       | 39 |
| 7.1 Special Registers       43         7.2 Watchdog Timer       44         7.3 GPIO Interface       44         7.4 Timer with Interrupt       44         7.5 UART Interface       44         8 Multi-core Communication       45         8.1 Introduction       45         8.1.1 Design Goals       45         8.1.2 Context Identification       45         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       46         8.2.1 Memory Constraints       46         9 Analysis & Results       56         References       52                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |    | 6.3    | Memory Map                        | 41 |
| 7.2 Watchdog Timer       44         7.3 GPIO Interface       44         7.4 Timer with Interrupt       42         7.5 UART Interface       42         8 Multi-core Communication       48         8.1 Introduction       48         8.1.1 Design Goals       48         8.1.2 Context Identification       48         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       48         8.2.1 Memory Constraints       48         9 Analysis & Results       56         References       52                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 7  | Peri   | ipherals                          | 43 |
| 7.3 GPIO Interface       44         7.4 Timer with Interrupt       44         7.5 UART Interface       42         8 Multi-core Communication       48         8.1 Introduction       45         8.1.1 Design Goals       45         8.1.2 Context Identification       45         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       46         8.2.1 Memory Constraints       46         8 Analysis & Results       50         References       50                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |    | 7.1    | Special Registers                 | 43 |
| 7.4 Timer with Interrupt       44         7.5 UART Interface       45         8 Multi-core Communication       45         8.1 Introduction       45         8.1.1 Design Goals       45         8.1.2 Context Identification       45         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       46         8.2.1 Memory Constraints       46         9 Analysis & Results       50         References       52                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |    | 7.2    | Watchdog Timer                    | 43 |
| 7.5 UART Interface       44         8 Multi-core Communication       45         8.1 Introduction       45         8.1.1 Design Goals       45         8.1.2 Context Identification       45         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       46         8.2.1 Memory Constraints       46         9 Analysis & Results       56         References       52                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |    | 7.3    | GPIO Interface                    | 44 |
| 8 Multi-core Communication       48         8.1 Introduction       45         8.1.1 Design Goals       45         8.1.2 Context Identification       45         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       48         8.2.1 Memory Constraints       48         9 Analysis & Results       50         References       52                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |    | 7.4    | Timer with Interrupt              | 44 |
| 8.1 Introduction       45         8.1.1 Design Goals       45         8.1.2 Context Identification       45         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       48         8.2.1 Memory Constraints       48         9 Analysis & Results       50         References       52                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |    | 7.5    | UART Interface                    | 44 |
| 8.1.1 Design Goals       45         8.1.2 Context Identification       45         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       48         8.2.1 Memory Constraints       48         9 Analysis & Results       50         References       52                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 8  | Mul    | lti-core Communication            | 45 |
| 8.1.2 Context Identification       45         8.1.3 Thread Synchronisation       46         8.2 Design Challenges       48         8.2.1 Memory Constraints       48         9 Analysis & Results       50         References       52                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |    | 8.1    | Introduction                      | 45 |
| 8.1.3 Thread Synchronisation       46         8.2 Design Challenges       48         8.2.1 Memory Constraints       48         9 Analysis & Results       50         References       52                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |    |        | 8.1.1 Design Goals                | 45 |
| 8.2 Design Challenges       46         8.2.1 Memory Constraints       48         9 Analysis & Results       50         References       52                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |    |        | 8.1.2 Context Identification      | 45 |
| 8.2.1 Memory Constraints       48         9 Analysis & Results       50         References       52                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |    |        | 8.1.3 Thread Synchronisation      | 46 |
| 9 Analysis & Results  50 References                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |    | 8.2    | Design Challenges                 | 48 |
| References 52                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |    |        | 8.2.1 Memory Constraints          | 48 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 9  | Ana    | alysis & Results                  | 50 |
| Appendices 53                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | Re | eferei | nces                              | 52 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Aı | openo  | dices                             | 53 |

TABLE OF CONTENTS 5

| A | Additional Figures                       | 53   |
|---|------------------------------------------|------|
|   | A.1 Register Set Multiplex               | . 53 |
|   | A.2 Instruction Set Architecture         | . 54 |
| В | Configuration Options                    | 55   |
|   | B.1 System-on-chip Configuration Options | . 55 |
|   | B.2 Core Options                         | . 56 |
|   | B.3 Peripheral Options                   | . 57 |
| C | Code Listing                             | 58   |
|   | C.1 vmicro16_soc_config.v                | . 58 |
|   | C.2 top_ms.v                             | . 60 |
|   | C.3 vmicro16_soc.v                       | . 61 |
|   | C.4 vmicro16_periph.v                    | . 67 |
|   | C 5 vmicro16 v                           |      |

# **List of Figures**

| <b>4.</b> 1 | and IO modules and uses a Message Transfer System to perform inter-node                                                                                                                                                                                                                                                                                                                  |    |
|-------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
|             | communication. Image source: [3]                                                                                                                                                                                                                                                                                                                                                         | 13 |
| 2.2         | A tightly coupled multiprocessor system. Nodes are directly connected to memory and IO modules. Image source: [3]                                                                                                                                                                                                                                                                        | 13 |
| 2.3         | A multiprocessor network-on-chip architecture with 16 processing nodes. Nodes are connected in a grid formation with routers and links. Image source: [6]                                                                                                                                                                                                                                | 14 |
| 3.1         | Project stages in a Gantt chart.                                                                                                                                                                                                                                                                                                                                                         | 19 |
| 3.2         | Terasic DE1-SoC development board featuring the Altera Cyclone V FPGA and many peripherals. Image source: [10]                                                                                                                                                                                                                                                                           | 20 |
| 3.3         | Minispartan-6+ development board featuring the Xilinx Spartan 6 XC6SLX9.  Note that the XC6SLX9 and XC6SLX25 FPGAs share the same board. Image                                                                                                                                                                                                                                           |    |
|             | source: [11]                                                                                                                                                                                                                                                                                                                                                                             | 21 |
| 4.1         | Vmicro16 RISC 5-stage RTL diagram showing: instruction pipelining (data passed forward through clocked register banks at each stage); branch address calculation; ALU operand calculation (rd2 or imm); and program counter incrementing.                                                                                                                                                | 24 |
| 4.2         | Vmicro16 ALU diagram showing clocked inputs from the previous IDEX stage being                                                                                                                                                                                                                                                                                                           |    |
| 4.3         | Time diagram showing the TIMR0 peripheral emitting a 1us periodic interrupt signal (out) to the processor. The processor acknowledges the interrupt (int_pending_ack) and enters the interrupt mode (regs_use_int) for a period of time. When the interrupt handler reaches the Interrupt Return instruction (indicated by w_intr) the processor returns to normal mode and restores the |    |
|             | normal state                                                                                                                                                                                                                                                                                                                                                                             | 29 |
| 4.5         | Interrupt Mask register (0x0108). Each bit corresponds to an interrupt source. 1 signifies the interrupt is enabled for/visible to the core. Bits [7:2] are left to the                                                                                                                                                                                                                  |    |
| 4.4         | designer to assign.                                                                                                                                                                                                                                                                                                                                                                      | 30 |
| 4.4         | The interrupt vector consists of eight 16-bit values that point to memory addresses of the instruction memory to jump to.                                                                                                                                                                                                                                                                | 30 |
| 5.1         | Waveform showing an APB read transaction                                                                                                                                                                                                                                                                                                                                                 | 33 |
| 5.2         | Block diagram of the Vmicro16 system-on-chip                                                                                                                                                                                                                                                                                                                                             | 34 |

LIST OF FIGURES 7

| 5.3        | Foo                                                                                                                                                             | 36 |
|------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 6.1        | Schematic showing the address decoder (addr_dec) accepting the active PADDR signal and outputting PSEL chip enable signals to each peripheral                   | 39 |
| 6.2        | Example 4-bit binary comparator which compares the bits (a, b, c, d) to the constant value 1010. The 0s of the constant are inverted and then all are passed    |    |
| <i>(</i> ) | to a wide-AND.                                                                                                                                                  | 39 |
| 6.3        | Bits [7:3] of an 8-bit PADDR signal are used as inputs to 5-bit LUTs to generate a PSEL signal. In addition, a default error case is shown allowing the address |    |
|            | decoder to detect incorrect PADDR values (e.g. if no PSEL signals are generated).                                                                               | 40 |
| 6.4        | Partial address decoding used by the Vmicro16 SoC design. Each peripheral                                                                                       |    |
|            | shown only needs to decode a signal bit to determine if it is enabled                                                                                           | 40 |
| 6.5        | Memory map showing addresses of various memory sections                                                                                                         | 42 |
| 8.1        | Block digram showing the main multi-processing components: the CPU array                                                                                        |    |
|            | and a peripheral interconnect used for core synchronisation                                                                                                     | 46 |
| 8.2        | Vmicro16 Special Registers layout (0x0080 - 0x008F)                                                                                                             | 46 |
| 8.3        | Assembly code for locking a mutex. r1 is the address to lock. r3 is zero. r4 is the branch address                                                              | 47 |
| 8.5        | •                                                                                                                                                               | 48 |
|            |                                                                                                                                                                 | 40 |
| 8.4        | Assembly code for a memory barrier. Threads will wait in the barrier_wait function until all other threads have reached that code point                         | 49 |
| A.1        | Normal mode and interrupt mode register sets are multiplexed to instantly save the context                                                                      | 53 |
| Δ 2        | Vmicro16 instruction set architecture.                                                                                                                          |    |
| 17.4       | - VIIIICIVIVIIII IIIGELUCIIVII BELGICIIILECLUIE                                                                                                                 |    |

# **List of Tables**

| 3.1 | Project stages throughout the life cycle of the project | 18 |
|-----|---------------------------------------------------------|----|
| B.1 | SoC Configuration Options                               | 55 |
| B.2 | Core Options                                            | 56 |
| B.3 | Peripheral Options                                      | 57 |

# **List of Listings**

| 1 | ALU branch detection using flags: zero (Z), overflow (V), and negative (N)  | 26 |
|---|-----------------------------------------------------------------------------|----|
| 2 | Vmicro16's ALU implementation named vmicro16_alu. vmicro16.v                | 26 |
| 3 | Vmicro16's ALU implementation named vmicro16_alu. vmicro16.v                | 27 |
| 4 | Vmicro16's decoder module code showing nested bit switches to determine the |    |
|   | intended opcode. vmicro16.v                                                 | 27 |
| 5 | Variable size inputs and outputs to the interconnect.                       | 36 |
| 6 | RAM and lock memories instantiated by the shared memory peripheral          | 47 |

## Chapter 1

## Introduction

| 1.1 | Why Multi-core? | 10 |
|-----|-----------------|----|
| 1.2 | Why RISC?       | 11 |
| 1.3 | Why FPGA?       | 11 |

This project will detail the design, implementation, and verification, of a new multi-core RISC processor aimed at FPGA devices. This project was chosen due to my interest in processor design, in which I have only previously designed single-core RISC processors and wish to extend this knowledge to gain a basic understanding of multi-core communication, design considerations, and the limitations of parallelism first hand.

I will use this opportunity to further develop my knowledge of FPGA and processor design by implementing, designing, and verifying, a multi-core RISC processor from scratch, including the design of a communication interface between multiple cores.

### 1.1 Why Multi-core?

Moore's Law states that the number of transistors in a chip will double every 2 years []. CPU designers would utilize the additional transistors to add more pipeline stages in the processor to reduce the propagation delay [] which would allow for higher clock frequencies.

The size of transistors have been decreasing [] and today can be manufactured in sub-10 nanometer range. However, the extremely small transistor size increases electrical leakage and other negative effects resulting in unreliability and potential damage to the transistor []. The high transistor count produces large amounts of heat and requires increasing power to supply the chip. These trade-offs are currently managed by reducing the input voltage, utilising complex cooling techniques, and reducing clock frequency. These factors limit the performance of the chip significantly. These are contributing factors to Moore's Law *slowing* down. The capacity limit of the current-generation planar transistors is approaching and so in order for performance increases to continue, other approaches such as alternate transistor technologies like Multigate transistors [1], software and hardware optimisations, and multi-processor architectures are employed.

This report will focus on the latter: to produce a small multi-core processor that can utilise software-based parallelism to gain performance benefits, compared to a larger single-core

design.

### 1.2 Why RISC?

RISC architectures feature simpler and fewer instructions compared to CISC, which emphasises instructions that perform larger tasks. A single CISC instruction might be performed with multiple RISC instructions. Because of the fewer and simpler instructions, RISC machines rely heavily on software optimisations for performance. RISC instruction sets are based on load/store architectures, where most instructions are either register-to-register or memory reading and writing [2]. This constraint greatly reduces complexity.

RISC architectures are easier to design implement, especially for beginners, due to their simpler instructions that share the same pipeline, compared to CISC where there may be different pipeline for each instruction, which would greatly consume FPGA resources.

### 1.3 Why FPGA?

Field programmable gate arrays (FPGA) are a great choice for prototyping digital logic designs due to their programmable nature and quick development times.

My previous experience with FPGAs in previous projects will reduce risk and learning times and allow for more time to be spent on adding and extending features (discusses further in section 3.1).

FPGAs, however, may not be suitable for prototyping all register-transistor logic (RTL) projects. Larger RTL projects, such as large commercial processors, may greatly exceed the logic cell resources available in today's high-end FPGA devices and may only be prototyped through silicon fabrication, which can be expensive. This resource limitation will not be problem as the project aims to produce a small and minimal design specifically for learning about multi-core architectures.

## Chapter 2

## **Background**

| 2.1 | Amdahl's Law and Parallelism           | . 12 |
|-----|----------------------------------------|------|
| 2.2 | Loosely and Tightly Coupled Processors | . 12 |
| 2.3 | Network-on-chip Architectures          | . 13 |

#### 2.1 Amdahl's Law and Parallelism

In many applications, not restricted to software, there may exists many opportunities for processes or algorithms to be performed in parallel. These algorithms can be split into two parts: a serial part that cannot be parallised, and a part that can be parallelised. Amdahl's Law defines a formula for calculating the maximum *speedup* of a process with potential parallelism opportunities when ran in parallel with n many processors. Speedup is a term used to describe the potential performance improvements of an algorithm using an enhanced resource (in this case, adding parallel processors) compared to the original algorithm. Amdalh's Law is defined below, where the potential speedup  $S_p$  is dependant on the portion of program that can be parallelised p and the number of processing cores n:

$$S_p = \frac{1}{(1-p) + \frac{p}{n}} \tag{2.1}$$

This formula will be used throughout the project to gauge the the performance of the multi-core design running various software algorithms.

### 2.2 Loosely and Tightly Coupled Processors

Multiprocessor systems can be generalised into two architectures: loosely and tightly coupled, and each architecture has advantages and disadvantages. In loosely coupled systems, each processing node is self-contained – each node has it's own dedicated memory and IO modules. Communication between nodes is performed over a *Message Transfer System (MTS)* [3] in a master-slave control architecture.

Scalability in loosely coupled systems is generally easier to implement as each node can simply be appended to the shared MTS interface without large modifications to the rest of the system. Scalability is an important concern in this project as I wish to test the developed solution with a range of processing nodes.

As loosely coupled system's nodes feature there own memory and IO modules, they generally perform better in cases where interaction between nodes is not prominent – each node can store a separate part of the software program in it's memory module allowing simultaneous executing of the program.

In scenarios where inter-node communication is prominent however, access to the MTS interface must be scheduled to avoid access conflicts which introduces delays and idle times in the software programs execution, resulting in lower throughput. Figure 2.1 shows a general layout of a loosely coupled multiprocessor system.

Tightly coupled systems feature processing nodes that do not have their own dedicated memory or IO modules – each node is directly connected to a shared memory module using a dedicated port. In scenarios where inter-node communication is prominent, tightly coupled systems are generally better suited as nodes are directly connected to a shared memory and do not need to wait to use a shared bus.



**Figure 2.1:** A loosely coupled multiprocessor system. Each node features it's own memory and IO modules and uses a Message Transfer System to perform inter-node communication. Image source: [3].

**Figure 2.2:** A tightly coupled multiprocessor system. Nodes are directly connected to memory and IO modules. Image source: [3].

This project will utilise a loosely coupled architecture due to it's easier scalability implementation and my previous experience with the design of single-core processors. Although it will require a scheduler to access the MTS, the experience and knowledge gained from this task will be greatly beneficial for future projects.

### 2.3 Network-on-chip Architectures

Network-on-chip (NoC) architectures implement on-chip communication mechanisms that are based on network communication principles, such as routing, switching, and massive scalability [4]. NoC's can generally support hundreds to millions of processing cores. Figure 2.3 shows an example 16-core network-on-chip architecture. NoC's can scale to very large sizes while not sacrificing performance because each processor core is able to drive the network rather than needing to wait for a shared bus to become free before doing so.

The greater the number of cores in a network-on-chip design, the greater quality of service

(QoS) problems arise. As such, network-on-chip architectures suffer the same problems as networks, such as fairness and throughput [5].



**Figure 2.3:** A multiprocessor network-on-chip architecture with 16 processing nodes. Nodes are connected in a grid formation with routers and links. Image source: [6].

## Chapter 3

## **Project Overview**

| 3.1 | Projec | t Deliverables             | 15 |
|-----|--------|----------------------------|----|
|     | 3.1.1  | Core Deliverables (CD)     | 15 |
|     | 3.1.2  | Extended Deliverables (ED) | 16 |
| 3.2 | Projec | et Timeline                | 17 |
|     | 3.2.1  | Project Stages             | 17 |
|     | 3.2.2  | Project Stage Detail       | 17 |
|     | 3.2.3  | Timeline                   | 19 |
| 3.3 | Resou  | irces                      | 19 |
|     | 3.3.1  | Hardware Resources         | 20 |
|     | 3.3.2  | Software Resources         | 21 |
| 3.4 | Legal  | and Ethical Considerations | 21 |

This chapter discusses the the project's requirements, goals, and structure.

### 3.1 Project Deliverables

The project's deliverables are split into two sections: core deliverables (CD) – each deliverable must be satisfied for the project to be a minimum viable product (MVP), and extended deliverables (ED) – deliverables that are not required for a MVP – features that only improve upon an existing feature.

#### 3.1.1 Core Deliverables (CD)

The project's core deliverables are described below.

#### CD1 Design a compact 16-bit RISC instruction set architecture.

The instruction set will be the primary interface to control the processor from software. An instruction set will be required to implement the custom multi-core communication interface.

It was decided to design a new instruction set rather than to extend an existing architecture as this will increase my knowledge of the constraints to consider when designing instruction sets and processors.

#### CD2 Design and implement a Verilog RISC core that implements the ISA in CD1.

The Verilog RISC core will be able to run software program written for the instruction set architecture.

# CD3 Design and implement an on-chip interconnect for multi-core processing (2 to 32 cores) using the RISC core from CD2.

The interconnect will be a chief requirement to enable multi-core communication. The interconnect should support up to 32 cores, however FPGA implementation constraints may limit this due to limited resources.

The interconnect will control communication between the cores to enable software parallelism.

# CD4 Analyse performance of serial and parallel software algorithms, such as parallel DFT, on the processor.

To evaluate the effectiveness of the developed solution, a serial and parallel implementation of a simple computing algorithm (parallel reduction, sorting) will be ran on the processor and it's performance analysed. Effectiveness will be rated on total algorithm run-time and the speed-up gained by adding more cores.

# CD5 Allow the RISC core to be easily compiled to multiple FPGA vendors (Xilinx, Altera).

The developed solution should be generic and portable to allow it to be used across a wide-range of FPGA vendors and devices.

Verilog is a generic implementation-independent hardware-description language and so designing implementation specific modules is recommended.

A key consideration for this requirement is to consider the varying hard IP provided by the FPGA vendors (such as BRAM, ethernet, and PCIe [7, 8]). To overcome this problem, the developed Verilog code will conditionally compile where vendor specific requirements are present.

#### 3.1.2 Extended Deliverables (ED)

The project's extended deliverables are described below.

- **ED1** Design a RISC core with an instructions-per-clock (IPC) rating of at least 1.0 (a single-cycle CPU).
- **ED2** Design a RISC core with a pipe-lined data path to increase the design's clock speed.
- **ED3** Design a scalable multi-core interconnect supporting arbitrary (more than 32) RISC core instances (manycore) using Network-on-Chip (NoC) architecture.
- **ED4** Design a compiler-backend for the PRCO304 [9] compiler to support the ISA from **CD1**. This will make it easier to build complex multi-core software for the processor.
- **ED5** The RISC core can communicate to peripherals via a memory-mapped addresses using the Wishbone bus.

- **ED6** Implement various memory-mapped peripherals such as UART, GPIO, LCD, to aid visual representation of the processor during the demonstration viva.
- **ED7** Store instruction memory in SPI flash.
- ED8 Reprogram instruction memory at runtime from host computer.
- ED9 Processor external debugger using host-processor link.

### 3.2 Project Timeline

#### 3.2.1 Project Stages

The project is split up into many stages to aid planning and management of the project. There are 8 unique stage areas: 1. Inital project conception; 2 Basic RISC core development; 3. Extended RISC core development; 4. Multi-core development; 5. Processor quality-of-life (QoL) improvements; 6. Compiler development; 7. Demo preparation, and 8. Final report.

The project stages are shown in Table 3.1.

#### 3.2.2 Project Stage Detail

#### Stages 1.0 through 1.2 – Research and Project Conception

These stages cover initial research of existing problems and solutions in the multiprocessor area. The instruction set architecture is also proposed that later stages will implement.

#### Stages 2.1 through 2.3 - Processor module Design, Implementation, and Integration

These stages cover the design, implementation, and integration of key processor core modules such as the instruction decoder, register sets and local memory. Integration of all the modules is a challenging task because some modules have both asynchronous and synchronous signals that need to be timed correctly in order for other modules to receive valid data. An example of this is the register set which has asynchronous read ports that are later clocked in the instruction decode stage.

#### Stages 3.1 through 3.4 – Advanced Processor Implementation

These stages add advanced features to the processor to provide a more functional product. Although these stages are classified as extended, their technical requirement to design and implement is not great and so are have time allocations in the project schedule. The extended features that these stages introduce are: pipelined processor stages – to drastically increase processor performance; provide a memory-mapped peripheral interface through the MMU; provide a Wishbone master interface to the MMU – allowing external peripherals such as GPIO and LCD displays to be utilised in a modular fashion; and to implement a cache memory for each processor core.

| Stage | Title                                        | Start Date | Days | Core | Applicable<br>Deliverables |
|-------|----------------------------------------------|------------|------|------|----------------------------|
| 1.0   | Research                                     | Feb 04     | 7    | x    |                            |
| 1.1   | Requirement gathering/review                 | Feb 11     | 14   | х    |                            |
| 1.1   | Processor specification, architecture, ISA   | Feb 18     | 100  | х    | CD1                        |
| 1.2   | Stage/Time Allocation Planning               | Feb 25     | 7    | x    |                            |
| 2.1   | Decoder, Register Set, impl & integration    | Feb 25     | 14   | x    | CD2                        |
| 2.2   | Register set impl & integration              | Mar 04     | 14   | x    | CD2                        |
| 2.3   | Local memory impl & integration              | Mar 11     | 14   | х    | CD2                        |
| 3.1   | Memory mapped register layout & impl         | Apr 01     | 21   |      | ED5                        |
| 3.2   | Wishbone peripheral bus connected to MMU     | Apr 08     | 21   |      | ED5                        |
| 3.3   | Pipelined implementation and verification    | Apr 15     | 21   |      | ED2                        |
| 3.4   | Cache memory design & impl                   | Apr 22     | 28   |      | ED2                        |
| 4.1   | Multi-core communication interface           | TBD        | TBD  | x    | CD3                        |
| 4.2   | Shared-memory controller                     | TBD        | TBD  | x    | CD3                        |
| 4.3   | Scalable multi-core interface (10s of cores) | TBD        | TBD  | x    | CD3                        |
| 4.4   | Multi-core example program (reduction)       | TBD        | TBD  | x    | CD4                        |
| 5.1   | SPI-FPGA interface for OTG programming       | TBD        | TBD  |      | ED7                        |
| 5.2   | FPGA-PC interfacing                          | TBD        | TBD  |      | ED9                        |
| 5.3   | FPGA-PC debugging (instruction breakpoints)  | TBD        | TBD  |      | ED9                        |
| 6.1   | Compiler backend for vmicro16                | TBD        | TBD  |      | ED4                        |
| 6.2   | Compiler support for multi-core codegen      | TBD        | TBD  |      | ED4                        |
| 7.1   | Wishbone peripherals for demo                | TBD        | TBD  | x    | CD4                        |
| 8.1   | Final Report                                 | TBD        | TBD  | x    |                            |

 Table 3.1: Project stages throughout the life cycle of the project.

#### Stages 4.1 through 4.4 – Multiprocessor Functionality

These stages are dedicated to adding multiprocessor functionality using a loosely coupled architecture to the processor.

#### **Stages 5.1 through 5.3 – Debugging Features**

These stages cover debugging features and are classified as extended due to the large development time required to implement them as well as not being related to multiprocessor systems.

#### Stages 6.1 through 6.2 - Compiler Backends

These stages cover the implementation of a compiler backend to ease software writing and programming of the processor.

#### **Stage 7.1 – Wishbone Peripherals**

Additional Wishbone peripherals, such as SPI and timers will be added to produce a more useful multiprocessor system.

#### Stage 8.1 – Final Report

This stage is dedicated to the final report write-up. It is expected to be an iterative task that is active throughout the lifespan of the project.

#### 3.2.3 Timeline

The project stages from Table 3.1 are displayed below in a Gantt chart.



Figure 3.1: Project stages in a Gantt chart.

#### 3.3 Resources

This section describes the hardware and software resources required to fulfil the project.

#### 3.3.1 Hardware Resources

Core deliverable CD5 requires the designed RISC core to be implemented and demonstrated on multiple FPGA devices. Although my design should synthesise for physical IC implementation, due to high costs and lengthy production times, it is not a primary development target. Due to having past experience with Xilinx FPGAs from my placement work and experience with Altera from university modules it was decided to target the Xilinx Spartan 6 XC6SLX9 and the Altera Cyclone V.

#### Terasic DE1-SoC Development Board

The Terasic DE1-SoC development board features a large Cyclone V FPGA and many peripherals, such as seven-segment displays, 64 MB SDRAM, ADCs, and buttons and switches, which will aid demonstration of the project. The development board is available through the university so the cost is negligible. Figure 3.2 shows the peripherals (green) available to the FPGA.



Figure 3.2: Terasic DE1-SoC development board featuring the Altera Cyclone V FPGA and many peripherals. Image source: [10].

#### Minispartan 6+ FPGA Development Board

The Minispartan 6+ is a hobbyist FGPA development board with fewer peripherals than the DE1-SoC. The board features a Xilinx Spartan 6 XC6LX9 which has far fewer resources than the DE1-SoC's Cyclone V however it's simplicity and my familiarity with Xilinx's software suite will speed up development. The development board is shown in Figure 3.3.



**Figure 3.3:** Minispartan-6+ development board featuring the Xilinx Spartan 6 XC6SLX9. Note that the XC6SLX9 and XC6SLX25 FPGAs share the same board. Image source: [11].

#### 3.3.2 Software Resources

#### **Intel Quartus**

Intel Quartus Prime is a paid-for SoC, CPLD, and FPGA software suite targeting Intel's Stratix, Arria, and Cyclone based FPGAs. The university provides student licences which will be used via VPN.

#### Xilinx ISE Webpack

Xilinx ISE Webkpack is Xilinx's free software suite for FPGA development for Spartan 6 based FPGAs. Due to ISE's intuitive and fast work flow, most of the initial simulation and verification processes will be performed using ISE. This will greatly improve development times.

#### Verilator

Verilator is an open-source Verilog to C++ transpiler which provides a C++ interface to simulate Verilog modules and read/write values similar to a test bench. Verilator will be used for specific modules within the RISC core such as the ALU and decoder as Verilator is useful when performing exhaustive verification.

### 3.4 Legal and Ethical Considerations

The RISC core is designed to be used as an academic research and educational tool to aid learning and understanding of RISC and multi-core machines. It should not be use for roles where mission critical or safety is a factor.

The processor does not provide any memory protection features and any software running on the processor has full access to all memory.

The processor does not store/track/predict software instructions. The processor uses pipelining techniques to improve performance which results in future instructions entering

the pipeline even if the software's logical sequence does not include these instructions. This could result in security vulnerabilities similar to Intel's Spectre vulnerability [12].

## Chapter 4

## Single-core Design

| 4.1 | Introd                    | duction                      |  |  |  |
|-----|---------------------------|------------------------------|--|--|--|
| 4.2 | Design and Implementation |                              |  |  |  |
|     | 4.2.1                     | Instruction Set Architecture |  |  |  |
|     | 4.2.2                     | Instruction and Data Memory  |  |  |  |
|     | 4.2.3                     | Memory Management Unit       |  |  |  |
|     | 4.2.4                     | ALU Design                   |  |  |  |
|     | 4.2.5                     | Decoder Design               |  |  |  |
|     | 4.2.6                     | Pipelining                   |  |  |  |
|     | 4.2.7                     | Design Optimisations         |  |  |  |
| 4.3 | Interrupts                |                              |  |  |  |
|     | 4.3.1                     | Overview                     |  |  |  |
|     | 4.3.2                     | Hardware Implementation      |  |  |  |
|     | 4.3.3                     | Software Interface           |  |  |  |
|     | 4.3.4                     | Design Improvements          |  |  |  |
| 4.4 | Verific                   | ration 31                    |  |  |  |

#### 4.1 Introduction

While the majority of this report will focus on the multi-processing functionality of this project, it is important understand the design decisions of the single core to understand the features and limitations of the multi-core system-on-chip as a whole.

### 4.2 Design and Implementation

The single-core design is a traditional 5-stage RISC processor (fetch, decode, execute, memory, write-back). The core uses separate instruction and data memories in the style of a Harvard architecture [?].

To satisfy CD5, the Verilog code will be self-contained in a single file. This reduces the hierarchical complexity and eases cross-vendor project set-up as only a single file is required to be included. A disadvantage with this single file approach is that some external Verilog

verification tools that I plan to use, such as Verilator, do not currently support multiple Verilog modules (due to an unfixed bug) within a single file.



**Figure 4.1:** Vmicro16 RISC 5-stage RTL diagram showing: instruction pipelining (data passed forward through clocked register banks at each stage); branch address calculation; ALU operand calculation (rd2 or imm); and program counter incrementing.

A small reduction in size within the single-core will result in substantial size reductions in

#### 4.2.1 Instruction Set Architecture

Core deliverable CD1 details the background for the requirement of a custom instruction set architecture.

The 16-bit instruction set listing is shown in Figure A.2.

Most instructions are *destructive*, meaning that source operands also act as the destination, hence effectively *destroying* the original source operand. This design decision reduces the complexity of the ISA as traditional three operand instructions, for example add r0, r1, can be encoded using only two operands add r0, r1. However, this does increase the complexity of compilers as they may need to make temporary copies of registers as the instructions will *destroy* the original data.

The instruction set is split into 7 categories (highlighted by colours in Figure A.2): special instructions, such as halting and interrupt returns; memory operations, such as loading and storing; bitwise operations, such as XOR and AND; unsigned arithmetic; signed arithmetic; conditional branches; and atomic load/store instructions.

#### 4.2.2 Instruction and Data Memory

The design uses separate instruction and data memories similar to a Harvard architecture computer. This architecture was chosen due because it is generally easier to implement, however later resulted in design challenges in large multi-core designs. This is discussed later in the report.

#### 4.2.3 Memory Management Unit

It was decided to use a memory management unit (MMU) to make it easier and extensible to communicate with external peripherals or additional registers. This method would transparently use the existing LW/SW instructions which removes the requirement for a unique instruction for each peripheral.

#### 4.2.4 ALU Design

The Vmicro16's ALU is an asynchronous module that has 3 inputs: data a; data b; and opcode op; and outputs data c. The ALU is able to operate on both register data (rd1 and rd2) and immediate values. A switch is used to set the b input to either the rd2 or imm value from the previous stage.



Figure 4.2: Vmicro16 ALU diagram showing clocked inputs from the previous IDEX stage being

The ALU also performs comparison (CMP) operations in which it returns flags similar to X86's overflow, signed, and zero, flags. The combination of these flags can be used to easily compute relationships between the two input operands. For example, if the zero flag is not equal to the signed flag, then the relationship between inputs a and b is that a < b.

```
module branch (
    input [3:0] flags,
    input [7:0] cond,
    output reg en
);

always @(*)

case (cond)

    'WMICR016_OP_BR_U: en = 1;

'VMICR016_OP_BR_E: en = (flags['VMICR016_SFLAG_Z] == 1);

'VMICR016_OP_BR_BR_E: en = (flags['VMICR016_SFLAG_Z] == 0);

'VMICR016_OP_BR_G: en = (flags['VMICR016_SFLAG_Z] == 0) &&

(flags['VMICR016_SFLAG_Z] == 0) &&

(flags['VMICR016_SFLAG_Z] == 0) &&

(flags['VMICR016_SFLAG_Z] == flags['VMICR016_SFLAG_D]);

'VMICR016_OP_BR_L: en = (flags['VMICR016_SFLAG_Z] == flags['VMICR016_SFLAG_N]);

'VMICR016_OP_BR_E: en = (flags['VMICR016_SFLAG_Z] == flags['VMICR016_SFLAG_N]);

'VMICR016_OP_BR_LE: en = (flags['VMICR016_SFLAG_Z] == 1) ||

default: en = 0;

endmodule
```

Listing 1: ALU branch detection using flags: zero (Z), overflow (V), and negative (N).

The Verilog implementation of the ALU is shown in Listing 2. The ALU's asynchronous output is clocked with other registers, such as destination register rs1 and other control signals, in the EXME register bank.

```
always @(*) case (op)

// branch/nop, output nothing

VMICRO16_ALU_BR,

VMICRO16_ALU_NOP: c = {DATA_WIDTH{1'b0}};

// load/store addresses (use value in rd2)

VMICRO16_ALU_LW,

VMICRO16_ALU_SW: c = b;

// bitwise operations

VMICRO16_ALU_BIT_OR: c = a | b;

VMICRO16_ALU_BIT_XOR: c = a | b;

VMICRO16_ALU_BIT_AND: c = a | b;

VMICRO16_ALU_BIT_NOT: c = "(b);

VMICRO16_ALU_BIT_NOT: c = "(b);

VMICRO16_ALU_BIT_NOT: c = a << b;

VMICRO16_ALU_BIT_NOT: c = a << b;

VMICRO16_ALU_BIT_SHFT: c = a << b;

VMICRO16_ALU_BIT_SHFT: c = a >> b;
```

Listing 2: Vmicro16's ALU implementation named vmicro16\_alu. vmicro16.v

#### 4.2.5 Decoder Design

Instruction decoding occurs in the between the IFID and IDEX stages. The decoder extracts register selects and operands from the input instruction. The decoder outputs are asynchronous which allows the register selects to be passed to the register set and register data to be read asynchronously. The register selects and register read data is then clocked into the IDEX register bank.

Listing 4: Vmicro16's decoder module code showing nested bit switches to determine the intended opcode. vmicro16.v

In Listing 4, it can be seen that the first 4 opcode cases (BR, MULT, CMP, SETC) are represented using the same 15-11 (opcode) bits, however the BIT instructions share the same opcode and so require another bit range to be compared to determine the output function.

#### 4.2.6 Pipelining

In the interim progress update, the processor design featured *instruction pipelining* to meet requirement **ED1**. Instruction pipelining allows instructions executions to be overlapped in the pipeline, resulting in higher throughput (up to one instruction per clock) at the expense of 5-6 clocks of latency and code complexity. As the development of the project shifted from single-core to multi-core, it became obvious that the complexity of the pipelined processor would inhibit the integration of multi-core functionality. It was decided to remove the instruction pipelining functionality and use a simpler state-machine based pipeline that is much simpler to extend and would cause fewer challenges later in the project.

#### 4.2.7 Design Optimisations

In a design that has many instantiations of the same component, a small resource saving improvement within the component can have a significant overall savings improvement if it is instantiated many times. Project requirement CD5 requires the design to be compiled for a range of FPGA sizes, and so space saving optimisations are considered.

#### **Register Set Size Improvements**

A register set in a CPU is a fast, temporary, and small memory that software instructions directly manipulate to perform computation. In the Vmicro16 instruction set, eight registers named r0 to r7 are available to software. The instruction set allows up to two registers to be references in most instructions, for example the instruction add r0, r1 tells the processor to perform the following actions:

- **Clock 1.** Fetch r0 and r1 from the register set
- Clock 2. Add the two values together in the ALU
- **Clock 3.** Store the result back the register set in r0

For task 1, it was originally decided to use a dual port register set (meaning that two data reads can be performed in a single clock, in this case r0 and r1), however due to the asynchronous design of the register set (for speed) the RTL produced consumed a significant amount of FPGA resources, approximately 256 flip-flops (16 (data width) \* 8 (registers) \* 2 (ports)). To reduce this, it was decided to split task 1 into two steps over two clock cycles using a single-port register set. This required the processor pipe-line to use another clock cycle resulting in slightly lower performance, however the size improvements will allow for more cores to be instantiated in the design. This optimisation is also applied to the interrupt register set, resulting in a saving of approximately 256 flip-flops per core (128 in the normal mode register set, and 128 in the interrupt register set). As shown, adding a single clock delay saves a significant amount of LUTs. This saving is amplified when instantiating many cores.

#### 4.3 Interrupts

Interrupts are a technique used by processors to run software functions when an event occurs within the processor, such as exceptions, or signalled from an external source, such as a UART receiver signalling it has received new data. Today, it is common for micro-controllers, soft-processors, and desktop processors, to all feature interrupts. Modern implementations support an *interrupt vector* which is a memory array that contains addresses to different *interrupt handlers* (a software function called when a particular interrupt is received).

Although interrupts are not a requirement for a multi-core system, it was decided to implement this functionality to boost my understanding of such systems. In addition, example demos provided with this project are better visualised with a interrupt functionality.

#### 4.3.1 Overview

The interrupt functionality in this project supports the following:

- Per-core 8 cell interrupt vector accessible to software.
   Software programs running on the Vmicro16 processor can edit the interrupt vector to add their own interrupt handlers at runtime.
- Fast context switching.
  - A dedicated interrupt register set is multiplexed with the normal mode register set to provide instant context switching. It should be noted that only the registers are saved during a context switch. The means that the stack is not saved. A schematic of the register multiplex is shown in Figure A.1.
- Parametrised interrupt sources and widths. Users can configure the width of the interrupt in signals and the data width per interrupt source via the vmicro16\_soc\_config.v. By default, 8 interrupt sources are available and each can provide 8-bits of data.

#### 4.3.2 Hardware Implementation

#### **Context Switching**

When acting upon an incoming interrupt the current state the processor must be saved so that changes from the interrupt handler, such as register writes and branches, do not affect the current state. After the interrupt handler function signals it has finished (by using the *Interrupt Return INTR* instruction) the saved state is restored. In the case of the Vmicro16 processor, the program counter  $r_pc[15:0]$  and register set regs instance are the only states that are saved. Going forth, the terms *normal mode* and *interrupt mode* are used to describe what registers the processor should use when executing instructions.

When saving the state, to avoid clocking 128 bits (8 registers of 16 bits) into another register (which would increase timing delays and logic elements), a dedicated register set for the interrupt mode (regs\_isr) is multiplexed with the normal mode register set (regs). Then depending on the mode (identified by the register regs\_use\_int) the processor can easily switch between the two large states without significantly affecting timing.

The timing diagram in Figure 4.3 visually describes this process.



**Figure 4.3:** Time diagram showing the TIMR0 peripheral emitting a 1us periodic interrupt signal (out) to the processor. The processor acknowledges the interrupt (int\_pending\_ack) and enters the interrupt mode (regs\_use\_int) for a period of time. When the interrupt handler reaches the Interrupt Return instruction (indicated by w\_intr) the processor returns to normal mode and restores the normal state.

#### 4.3.3 Software Interface

To enable software to



**Figure 4.5:** Interrupt Mask register (0x0108). Each bit corresponds to an interrupt source. 1 signifies the interrupt is enabled for/visible to the core. Bits [7:2] are left to the designer to assign.



Figure 4.4: The interrupt vector consists of eight 16-bit values that point to memory addresses of the instruction memory to jump to.

#### **Interrupt Vector (0x0100-0x0107)**

The interrupt vector is a per-core register that is used to store the addresses of interrupt handlers. An interrupt handler is simply a software function residing in instruction memory that is branched to when a particular interrupt is received.

#### Interrupt Mask (0x0108)

The interrupt mask is a per-core register that is used to mask/listen specific interrupt sources. This enables processing cores to individually select which interrupts they respond to. This allows for multi-processor designs where each core can be used for a particular interrupt source, improving the time response to the interrupt for time critical programs. The Interrupt Mask register is an 8-bit read/write register where each bit corresponds to a particular interrupt source and each bit corresponds with the interrupt handler in the interrupt vector.

#### Software Example

To better understand the usage of the described interrupt registers, a simple software program is described below. The following software program produces a simple and power efficient routine to initialise the interrupt vector and interrupt mask.

```
entry:
1
          // Set interrupt vector at 0x100
2
          // Move address of isr0 function to vector[0]
3
                  r0, isr0
          // create 0x100 value by left shifting 1 8 bits
5
                  r1, #0x1
          movi
6
                  r2, #0x8
          movi
7
8
          lshft
                  r1, r2
9
          // write isr0 address to vector[0]
10
                  r0, r1
11
          // enable all interrupts by writing 0x0f to 0x108
12
                  r0, #0x0f
          movi
13
          sw
                  r0, r1 + #0x8
14
                                  // enter low power idle state
          halt
15
16
```

```
17 isr0: // arbitrary name
18 movi r0, #0xff // do something
19 intr // return from interrupt
```

A more complex example software program utilising interrupts and the TIMR0 interrupt is described in section ??.

#### 4.3.4 Design Improvements

The hardware and software interrupt design have changed throughout the projects cycle. In initial versions of the interrupt implementation, the software program, while waiting for an interrupt, would be in a tight infinite loop (branching to the same instruction). This resulted in the processor using all pipeline stages during this time. The pipeline stages produce many logic transitions and memory fetches which raise power consumption and temperatures. This is quite noticeable especially when running on the Spartan-6 LX9 FPGA.

To improve this, it was decided to implement a new state within the processor's state machine that, when entered, did not produce high frequency logic transitions or memory fetches. The HALT instruction was modified to enter this state and the only way to leave is from an interrupt or top-level reset. This removes the need for a software infinite loop that produces high frequency logic transitions (decoding, ALU, register reads, etc.) and memory fetches.

#### 4.4 Verification

Various verification techniques are employed to ensure correct operation of the processor.

The first technique involves using static assertions to identify incorrect configuration parameters at compile time, such as having zero instruction memory and scratch memory depth. These assertions use the static\_assert for top level checks and static\_assert\_ng for checks inside generate blocks.

The second verification technique is to use assertions in always blocks to identify incorrect behavioural states. This is done using the rassert (run-time assert) macro.

The third verification technique is to use automatic verifying test benches. These test benches drive components of the processor, such as the ALU and decoder, and check the output against the correct value. This uses the rassert macro.

The final method of verification is to verify the complete design via a behavioural test bench. The design is passed a compiled software program with a known expected output, and is ran until the r\_halt signal is raised. The test bench then checks the value on the debug0, debug1, and debug2 signals against the expected value. If this matches, then it is assumed that sub-components of the design also operate correctly. This technique does not monitor the states of sub-components and statistics (such as time taken to execute an instruction), there leaves the possibility that some components could have entered an illegal state.

## Chapter 5

### Interconnect

| 5.1 | Introduction    |                             |    |  |
|-----|-----------------|-----------------------------|----|--|
|     | 5.1.1           | Comparison of On-chip Buses | 32 |  |
| 5.2 | Overview        |                             |    |  |
|     | 5.2.1           | Design Considerations       | 34 |  |
| 5.3 | Interfaces      |                             |    |  |
|     | 5.3.1           | Master to Slave Interface   | 35 |  |
|     | 5.3.2           | Multi-master Support        | 36 |  |
| 5.4 | .4 Further Work |                             | 36 |  |

#### 5.1 Introduction

The Vmicro16 processor needs to communicate with multiple peripheral modules (such as UART, timers, GPIO, and more) to provide useful functionality for the end user.

Previous peripheral interface designs of mine have been directly connected to a main driver with unique inputs and outputs that the peripheral required. For example, a timer peripheral would have dedicated wires for it's load and prescaler values, wires for enabling and resetting, and wires for reading. A memory peripheral would have wires for it's address, read and write data, and a write enable signal. This resulted in each peripheral having a unique interface and unique logic for driving the peripheral, which consumed significant amounts of limited FPGA resources.

It can be seen that many of the peripherals need similar inputs and outputs (for example read and write data signals, write enables, and addresses), and because of this, a standard interface can be used to interface with each peripheral. Using a standard interface can reduce logic requirements as each peripheral can be driven by a single driver.

#### 5.1.1 Comparison of On-chip Buses

The choice of on-chip interconnect has changed multiple times over the life-cycle of this project, primary due to ease of implementation and resource requirements.

Originally, it was planned to use the Wishbone bus [? ] due to it's popularity within open-source FPGA modules and good quality documentation.

Late in the project, it was decided to use the AMBA APB protocol [? ] as it is more commonly used in large commercial designs and understanding how the interface worked would better benefit myself. APB describes an intuitive and easy to implement 2-state interface aimed at communicating with low-throughput devices, such as UARTs, timers, and watchdogs.



Figure 5.1: Waveform showing an APB read transaction.

#### 5.2 Overview

The system-on-chip design is split into 3 main parts: peripheral interconnect (red), CPU array (gray), and the instruction memory interconnect (green).

A block diagram of this project is shown in Figure 5.2



Figure 5.2: Block diagram of the Vmicro16 system-on-chip.

#### 5.2.1 Design Considerations

There are several design issues to consider for this project. These are listed below:

#### • Design size limitations

The target devices for this project are small to medium sized FPGAs (featuring approximately 10,000 to 30,000 logic cells). Because of this, it is important to use a bus interconnect that has a small logic footprint yet is able to scale reasonably well.

#### • Ease of implementation

The interconnect and any peripherals should be easy to implement within a reasonable time.

#### • Scalable

The interconnect should allow for easy scalability of master and slave interfaces with minimal code changes.

#### 5.3 **Interfaces**



#### 5.3.1 Master to Slave Interface

| 20 19 | 18 17 16 | 15 0       | _            |
|-------|----------|------------|--------------|
| LE    | CORE_ID  | Address    | PADDR[20:0]  |
|       |          | Write data | PWDATA[15:0] |
|       |          | Read Data  | PRDATA[15:0] |
|       |          | X          | PWRITE[0:0]  |
|       |          |            | PENABLE[0:0] |

#### 5.3.2 Multi-master Support

#### **Design Goals**

DG1. Foo Bing



Figure 5.3: Foo

```
[MASTER_PORTS*BUS_WIDTH-1:0]
[MASTER_PORTS-1:0]
[MASTER_PORTS-1:0]
[MASTER_PORTS-1:0]
                                                                            S PADDR.
         input
         input
                                                                            S_PWRITÉ,
2
                                                                            S_PSELx,
S_PENABLE,
3
         input
4
         input
         input [MASTER_PORTS*DATA_WIDTH-1:0]
output reg [MASTER_PORTS*DATA_WIDTH-1:0]
                                                                           S_PWDATA,
5
                                                                           S_PRDATA,
         output reg [MASTER_PORTS-1:0]
                                                                            S_PREADY,
```

Listing 5: Variable size inputs and outputs to the interconnect.



#### 5.4 Further Work

The submitted design is acceptable for a multi-core system as it fulfils the following requirements:

- Support an arbitrary number of peripherals.
- Supports memory-mapped address decoding.

• Supports multiple master interfaces.

## Chapter 6

# **Memory Mapping**

| 6.1 | Introduction                | 38 |
|-----|-----------------------------|----|
| 6.2 | Address Decoding            | 38 |
|     | 6.2.1 Decoder Optimisations | 39 |
| 6.3 | Memory Map                  | 41 |

The Vmicro16 processor uses a memory-mapping scheme to communicate with peripherals and other cores. This chapter describes the design decisions and implementation of the memory-map used in this project.

#### 6.1 Introduction

Memory mapping is a common technique used by CPUs, micro-controllers, and other systemon-chip devices, that enables peripherals and other devices to be accessed via a memory address on a common bus. In a processor use-case, this allows for the reuse of existing instructions (commonly memory load/store instructions) to communicate with external peripherals with little additional logic.

### 6.2 Address Decoding

An address decoder is used to determine the peripheral that the address is requesting. The address decoder module, addr\_dec in apb\_intercon.v, takes the 16-bit PADDR from the active APB interface and checks for set bits to determine which peripheral to select. The decoder outputs a chip enable signal PSEL for the selected peripheral. For example, if bit 12 is set in PADDR then the shared memory peripheral's PSEL is set high and others to low. A schematic for the decoder is shown in Figure 6.1.



Figure 6.1: Schematic showing the address decoder (addr\_dec) accepting the active PADDR signal and outputting PSEL chip enable signals to each peripheral.

#### **6.2.1** Decoder Optimisations

Performing a 16-bit equality comparison of the PADDR signal against each peripheral memory address consumes a significant amount of logic. Depending on the synthesis tools and FPGA features, a 16-bit comparator might require a fixed 16-bit value input to compare against (where the 0s are inverted) and a wide-AND to reduce and compare [13, 14]. An example 4-bit comparator is shown below in Figure 6.2.



**Figure 6.2:** Example 4-bit binary comparator which compares the bits (a, b, c, d) to the constant value 1010. The 0s of the constant are inverted and then all are passed to a wide-AND.

As we are targeting FPGAs, which use LUTs to implement combinatorial logic, we can conveniently utilise Verilog's == operator on fairly large operands without worrying about consuming too many resources. The targeted FPGA devices in this project, the Cyclone V and Spartan 6, feature 6-input LUTs which allow 64 different configurations [15, 16]. Knowing this, we can design the address decoder to utilise the FPGA's LUTs more effectively and reduce it's footprint significantly.

We can use part of the PADDR signal as a chip select and the other bits as sub-addresses to interface with the peripheral. The addressing bits are passed into the FPGA's 6-input LUTs which are programmed (via the bitstream) to output 1 or 0 depending on the address. Figure 6.3 below shows a LUT based approach to address decoding which will utilise approximately one ALM/CLB module per peripheral chip select (PSEL) and one for error detection. This method

of comparison (LUT based) is utilised in the addr\_dec module in apb\_intercon.v.



**Figure 6.3:** Bits [7:3] of an 8-bit PADDR signal are used as inputs to 5-bit LUTs to generate a PSEL signal. In addition, a default error case is shown allowing the address decoder to detect incorrect PADDR values (e.g. if no PSEL signals are generated).

The address decoding methods discussed above are examples of *full-address* decoding, where each bit (whether required or not) is compared. It is possible to further reduce the required logic by utilising *partial-address* decoding [17]. Partial-address decoding can reduce logic requirements by not using all bits. For example, if bits in address 0x0100 do not conflict with bits in other addresses (i.e. bit 8 is high in more than 1 address), then the address decoder needs only concern bit 8, not the other bits. This is visualised in Figure 6.4 below. This method is utilised in the MMU's address decoder (module vmicro16\_mmu in vmicro16.v:181). As this is an optimisation per core, significant resources can be saved when a large number of cores are used.



Figure 6.4: Partial address decoding used by the Vmicro16 SoC design. Each peripheral shown only needs to decode a signal bit to determine if it is enabled.

### 6.3 Memory Map

The system-on-chip's memory map is shown below in Figure 6.5. The addresses for each peripheral have been carefully chosen for both:

- Easy software access creating addresses via software requires few instructions (normally one to four MOVI and LSHIFT instructions to address 0x0000 to 0xffff), which increases software performance.
- and Reducing address decoding logic most addresses can be decoded using partial decoding techniques.



Figure 6.5: Memory map showing addresses of various memory sections.

## Chapter 7

# **Peripherals**

| 7.1 | Special Registers    | <b>4</b> 3 |
|-----|----------------------|------------|
| 7.2 | Watchdog Timer       | 43         |
| 7.3 | GPIO Interface       | 44         |
| 7.4 | Timer with Interrupt | 44         |
| 7.5 | UART Interface       | 44         |

To provide user's with useful functionality, common system-on-chip peripherals were created. This section describes each peripheral and it's design decisions.

### 7.1 Special Registers

From the software perspective, it is important for both the developer and software algorithms to know the target system's architecture to better utilise the resources available to them. Software written for one architecture with N cores must also run on an architecture with M cores. To enable such portability, the software must query the system for information such as: number of processor cores and the current core identifier. Without this information, the developer would be required to produce software for each individual architecture (e.g. an Intel i5 with 4 cores or an Intel i7 with 8 cores, or an NVIDIA GTX 970 with.

### 7.2 Watchdog Timer

In any multi-threaded system there exists the possibility for a deadlock – a state where all threads are in a waiting state – and algorithm execution is forever blocked. This can occur either by poor software programming or incorrect thread arbitration by the processor. A common method of detecting a deadlock is to make each thread signal that it is not blocked by resetting a countdown timer. If the countdown timer is not reset, it will eventually reach zero and it is assumed that all threads are blocked as none have reset the countdown.

In this system-on-chip design, software can reset the watchdog timer by writing any 16-bit value to the address 0x00B8.

This peripheral is optional and can be enabled using the configuration parameters described in Configuration Options.

| 15 | 14 | 13 | 12 | 11 | 10 | 9   | 8    | 7    | 6   | 5 | 4 | 3 | 2 | 1 | 0 |        |
|----|----|----|----|----|----|-----|------|------|-----|---|---|---|---|---|---|--------|
|    |    |    |    |    |    | Res | et W | atch | dog |   |   |   |   |   |   | 00B8 W |

### 7.3 **GPIO** Interface

| 15           | 14 | 13 | 12 | 11 | 10 | 9  | 8    | 7   | 6  | 5 | 4 | 3       | 2       | 1 | 0 |        |
|--------------|----|----|----|----|----|----|------|-----|----|---|---|---------|---------|---|---|--------|
| GPIO0 Output |    |    |    |    |    |    |      |     |    |   |   | 0090 RW |         |   |   |        |
| GPIO1 Output |    |    |    |    |    |    |      |     |    |   |   |         | 0091 RW |   |   |        |
|              |    |    |    |    |    | Gl | PIO1 | Inp | ut |   |   |         |         |   |   | 0092 R |

## 7.4 Timer with Interrupt



### 7.5 UART Interface



## **Chapter 8**

## **Multi-core Communication**

| 8.1        | Introduction |                        |   |  |  |  |  |  |  |  |  |  |
|------------|--------------|------------------------|---|--|--|--|--|--|--|--|--|--|
|            | 8.1.1        | Design Goals           | 5 |  |  |  |  |  |  |  |  |  |
|            | 8.1.2        | Context Identification | 5 |  |  |  |  |  |  |  |  |  |
|            | 8.1.3        | Thread Synchronisation | 6 |  |  |  |  |  |  |  |  |  |
| 8.2 Design |              | n Challenges           | 8 |  |  |  |  |  |  |  |  |  |
|            | 8.2.1        | Memory Constraints     | 8 |  |  |  |  |  |  |  |  |  |

So far we have discussed the features and design of the Vmicro16 system-on-chip. This section will discuss the multi-processing functionality and how to use it.

#### 8.1 Introduction

Multi-processing functionality is the primary deliverable of this project.

#### 8.1.1 Design Goals

#### • Support common synchronisation primitives.

Software should be able to implement common synchronisation primitives, such as mutexes, semaphores, and memory barriers, to perform atomic operations and avoid race conditions, which are critical in parallel and concurrent software applications.

#### • Context identification.

The SoC should expose configuration information such as: the number of processing cores, amount of shared and scratch memory, and the CORE\_ID, to each thread.

#### 8.1.2 Context Identification

A goal of the multi-processing functionality of this project is allow software written for it to be run on any number of cores. This means that a software program will scale to use all cores in the SoC without needing to rewrite the software. To enable this functionality, the software must be able to read contextual information about the SoC, such as the number of cores, how much global and scratch memory is available, and what the CORE\_ID of the current core is.



Figure 8.1: Block digram showing the main multi-processing components: the CPU array and a peripheral interconnect used for core synchronisation.

This information is provided through the Special Registers peripheral (0x0080 - 0x008F), shown in Figure 8.1. This register set provides relevant information for writing software that can dynamically scale for various SoC configurations.



Figure 8.2: Vmicro16 Special Registers layout (0x0080 - 0x008F).

#### 8.1.3 Thread Synchronisation

In multi-threaded software it is important

The mutex functionality is implemented using a similar scheme to that of ARM's *Global Monitor* [?].

#### Mutexes

In software, a mutex is an object used to control access to a shared resource. The term *object* is used as it's implementation is normally platform dependant, meaning that the processor may provide a hardware mechanism or is left for the operating system to provide.

In this project, mutexes are provided by the processor through the Shared Memory Peripheral (0x1000 to 0x1FFF) which provides a large RAM-style memory accessible by all cores through the peripheral interconnect bus. This large memory is explicitly defined to use the FPGA's BRAM blocks using Xilinx's Verilog ram\_style="block" attribute to avoid wasting LUTs when using high core counts. The peripheral allows each memory cell to be *locked*, meaning that only the cell owner can modify it's contents. This is implemented by using another large memory, locks, to store the CORE\_ID + 1 of the owner, as shown in Listing 6. In this system, a lock containing the value zero indicates an unlocked cell. As CORE\_IDs are indexed from zero, one is added to each cell. For example, if core two wants to lock a memory cell, the value three is written to the lock.

```
reg [15:0] ram [0:8191]; // 16KB large RAM memory reg [clog2(CORES):0] locks [0:8181]; // memory cell owner
```

Listing 6: RAM and lock memories instantiated by the shared memory peripheral.

To lock and unlock cells, the instructions LWEX and SWEX instructions are used. These instructions are similar to the LW/SW instructions but provide locking functionality. The *EX* in the instruction names indicate *exclusive access*. LWEX is used to read memory contents (like LW) and also lock the cell if not already locked. If a core attempts to lock an already locked cell, the lock does not change. Unlocking is done by the SWEX instruction, which conditionally writes to the memory cell if it is locked by the same core. Unlike SW, SWEX returns a zero for success and one for failure if it is locked by another core.

```
lock_mutex:
1
                // attempt lock
2
                lwex r0, r1
// check success
3
4
5
                swex r0, r1
6
                cmp r0, r3
                /ar{/} if not equal (NE), retry
8
                movi r4, lock_mutex
                     r4, BR_NE
      critical:
10
           // core has the mutex
11
```

**Figure 8.3:** Assembly code for locking a mutex. r1 is the address to lock. r3 is zero. r4 is the branch address.

Figure 8.3 shows a simple assembly function to lock a memory cell.

#### **Barriers**

Barriers are a useful software sequence used to block execution until all other threads (or a subset) have reached the same point. Barriers are often used for broadcast and gather actions (sending values to each core or receiving them). They are also used to synchronise program execution if some threads have more work to do than others.

The Vmicro16 processor provides barrier

synchronisation through the Shared Memory Peripheral. Like the mutex code, the barrier code uses the LWEX and SWEX instructions to lock a memory cell. Instead of immediately checking the lock as an abstract object, the barrier code treats the cell as a normal memory cell containing a numeric value. Figure 8.4 shows a software example of this. When the barrier\_reached code is reached, the code will increment the shared memory value by 1, indicating that the number of threads that have reached this point has increased by one (r5). The barrier\_wait function

is then entered which waits until this numeric value (r5) is equal to the number of threads (r7) in the system. If this is true, then all threads have reached the barrier\_wait function and can continue with normal program execution.

### 8.2 Design Challenges

#### 8.2.1 Memory Constraints



Figure 8.5: ●

```
barrier_reached:
// load latest count
lwex r0, r5
// try increment count
// increment by 1
1
2
3
4
                 addi r0, r3 + #0x01

// attempt store

swex r0, r5
6
8
9
                  // check success (== 0)
10
                cmp r0, r3
// branch if failed
movi r4, barrier_reached
br r4, BR_NE
11
12
13
14
15
          barrier_wait:
16
                 // load the count
lw r0, r5
// compare with number of threads
17
18
                 cmp r0, r7
// jump back to barrier if not equal
20
21
                              r4, barrier_wait r4, BR_NE
                 movi
22
                 br
23
```

**Figure 8.4:** Assembly code for a memory barrier. Threads will wait in the barrier\_wait function until all other threads have reached that code point.

# **Chapter 9**

# **Analysis & Results**

REFERENCES 51

#### References

[1] V. Subramanian, "Multiple gate field-effect transistors for future CMOS technologies," *IETE Technical review*, vol. 27, no. 6, pp. 446–454, 2010.

- [2] M. J. Flynn, Computer architecture: Pipelined and parallel processor design. Jones & Bartlett Learning, 1995.
- [3] Tech Differences, "Difference between loosely coupled and tightly coupled multiprocessor system (with comaprison chart)," Jul 2017. [Online]. Available: https://techdifferences.com/difference-between-loosely-coupled-and-tightly-coupled-multiprocessor-system. html (Accessed 2019-04-20).
- [4] L. Benini and G. De Micheli, "Networks on Chips: A new SoC paradigm," *Computer*, vol. 35, pp. 70–78, 02 2002.
- [5] D. Zhu, L. Chen, S. Yue, T. M. Pinkston, and M. Pedram, "Balancing On-Chip Network Latency in Multi-application Mapping for Chip-Multiprocessors," in 2014 IEEE 28th International Parallel and Distributed Processing Symposium, May 2014, pp. 872–881.
- [6] N. Chatterjee, S. Paul, and S. Chattopadhyay, "Fault-tolerant dynamic task mapping and scheduling for network-on-chip-based multicore platform," *ACM Transactions on Embedded Computing Systems*, vol. 16, pp. 1–24, 05 2017.
- [7] Xilinx, Spartan-6 FPGA Block RAM Resources, Xilinx.
- [8] Altera, Recommended HDL Coding Styles QII51007-9.0.0, Altera.
- [9] B. Lancaster, "FPGA-based RISC Microprocessor and Compiler," vol. 3.14, pp. 37–50. [Online]. Available: https://github.com/bendl/prco304 (Accessed March 2018).
- [10] Terasic Technologies, "SoC Platform Cyclone DE1-SoC Board." [Online]. Available: https://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English& No=836 (Accessed 2019-04-20).
- [11] *MiniSpartan6*+, Scarab Hardware, 2014. [Online]. Available: https://www.scarabhardware.com/minispartan6/ (Accessed 2019-04-20).
- [12] P. Kocher, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, S. Mangard, T. Prescher, M. Schwarz, and Y. Yarom, "Spectre attacks: Exploiting speculative execution," arXiv preprint arXiv:1801.01203, 2018.
- [13] A. Palchaudhuri and R. S. Chakraborty, High Performance Integer Arithmetic Circuit Design on FPGA: Architecture, Implementation and Design Automation. Springer, 2015, vol. 51.
- [14] V. Salauyou and M. Gruszewski, "Designing of hierarchical structures for binary comparators on fpga/soc," in *IFIP International Conference on Computer Information Systems and Industrial Management*. Springer, 2015, pp. 386–396.
- [15] Xilinx, Spartan-6 FPGA Configurable Logic Block User Guide UG384, Xilinx.

REFERENCES 52

[16] Altera, Cyclone V Device Handbook - Device Interfaces and Integration - CV-5V2, Altera.

[17] A. S. Tanenbaum, Structured Computer Organization. Pearson Education India, 2016.

# Appendix A

# **Additional Figures**

## A.1 Register Set Multiplex



Figure A.1: Normal mode and interrupt mode register sets are multiplexed to instantly save the context.

## A.2 Instruction Set Architecture

|             | 15-11 | 10-8    | 7-5  | 4-0   | rd ra simm5                  |  |  |
|-------------|-------|---------|------|-------|------------------------------|--|--|
|             | 15-11 | 10-8    | 7-0  | 40    | rd imm8                      |  |  |
|             | 15-11 |         |      |       | nop                          |  |  |
|             | 15    | 14:12   | 11:0 |       | extended immediate           |  |  |
| SPCL        | 00000 | 11 bits | 11.0 |       | NOP                          |  |  |
| SPCL        | 00000 | 11h'000 |      |       | NOP                          |  |  |
| SPCL        | 00000 | 11h'001 |      |       | HALT                         |  |  |
| SPCL        | 00000 | 11h'002 |      |       | Return from interrupt        |  |  |
| LW          | 00000 | Rd      | Ra   | s5    | Rd <= RAM[Ra+s5]             |  |  |
| SW          | 00001 | Rd      | Ra   | s5    | RAM[Ra+s5] <= Rd             |  |  |
| BIT         | 00011 | Rd      | Ra   | s5    | bitwise operations           |  |  |
| BIT OR      | 00011 | Rd      | Ra   | 00000 | Rd <= Rd   Ra                |  |  |
| BIT_XOR     | 00011 | Rd      | Ra   | 00001 | Rd <= Rd ^ Ra                |  |  |
| BIT_AND     | 00011 | Rd      | Ra   | 00001 | Rd <= Rd & Ra                |  |  |
| BIT_NOT     | 00011 | Rd      | Ra   | 00011 | Rd <= ~Ra                    |  |  |
| BIT_LSHFT   | 00011 | Rd      | Ra   | 00100 | Rd <= Rd << Ra               |  |  |
| BIT_RSHFT   | 00011 | Rd      | Ra   | 00100 | Rd <= Rd >> Ra               |  |  |
| MOV         | 00100 | Rd      | Ra   | X     | Rd <= Ra                     |  |  |
| MOVI        | 00101 | Rd      |      | 8     | Rd <= i8                     |  |  |
| ARITH U     | 00110 | Rd      | Ra   | s5    | unsigned arithmetic          |  |  |
| ARITH UADD  | 00110 | Rd      | Ra   | 11111 | Rd <= uRd + uRa              |  |  |
| ARITH_USUB  | 00110 | Rd      | Ra   | 10000 | Rd <= uRd - uRa              |  |  |
| ARITH_UADDI | 00110 | Rd      | Ra   | OAAAA | Rd <= uRd + Ra + AAAA        |  |  |
| ARITH_S     | 00111 | Rd      | Ra   | s5    | signed arithmetic            |  |  |
| ARITH SADD  | 00111 | Rd      | Ra   | 11111 | Rd <= sRd + sRa              |  |  |
| ARITH SSUB  | 00111 | Rd      | Ra   | 10000 | Rd <= sRd - sRa              |  |  |
| ARITH_SSUBI | 00111 | Rd      | Ra   | OAAAA | Rd <= sRd - sRa + AAAA       |  |  |
| BR          | 01000 | Rd      |      | 8     | conditional branch           |  |  |
| BR U        | 01000 | Rd      |      | 0000  | Any                          |  |  |
| BR_E        | 01000 | Rd      |      | 0001  | Z=1                          |  |  |
| BR NE       | 01000 | Rd      | 0000 | 0010  | Z=0                          |  |  |
| BR G        | 01000 | Rd      | 0000 | 0011  | Z=0 and S=O                  |  |  |
| BR GE       | 01000 | Rd      | 0000 | 0100  | S=O                          |  |  |
| BR_L        | 01000 | Rd      |      | 0101  | S != O                       |  |  |
| BR_LE       | 01000 | Rd      |      | 0110  | Z=1 or (S != O)              |  |  |
| BR_S        | 01000 | Rd      | 0000 | 0111  | S=1                          |  |  |
| BR_NS       | 01000 | Rd      | 0000 | 1000  | S=0                          |  |  |
| CMP         | 01001 | Rd      | Ra   | Х     | SZO <= CMP(Rd, Ra)           |  |  |
| SETC        | 01010 | Rd      | Im   | m8    | Rd <= (Imm8 _f_ SZO) ? 1 : 0 |  |  |
| MULT        | 01011 | Rd      | Ra   | Х     | Rd <= uRd * uRa              |  |  |
| HALT        | 01100 |         | Х    |       |                              |  |  |
|             |       |         |      |       |                              |  |  |
| LWEX        | 01101 | Rd      | Ra   | s5    | Rd <= RAM[Ra+s5]             |  |  |
| CTTEN       | 01101 | 110     | TTG. |       | RAM[Ra+s5] <= Rd             |  |  |
| SWEX        | 01110 | Rd      | Ra   | s5    | Rd <= 0 1 if success         |  |  |
| UVVER       | 01110 |         | Ru   | 33    | na 0 1 ii success            |  |  |

**Figure A.2:** Vmicro16 instruction set architecture.

# Appendix B

# **Configuration Options**

| B.1 | System-on-chip Configuration Options | 55 |
|-----|--------------------------------------|----|
| B.2 | Core Options                         | 56 |
| B.3 | Peripheral Options                   | 57 |

The following configuration options are defined in vmicro16\_soc\_config.v.

## **B.1** System-on-chip Configuration Options

| Macro            | Default | Purpose                                                       |
|------------------|---------|---------------------------------------------------------------|
| CORES            | 4       | Number of CPU cores in the SoC                                |
| SLAVES           | 7       | Number of peripherals                                         |
| DEF_USE_WATCHDOG |         | Enable watchdog module to detect deadlocks and infinite loops |

Table B.1: SoC Configuration Options

## **B.2** Core Options

| Macro                  | Default | Purpose                                                           |
|------------------------|---------|-------------------------------------------------------------------|
| DATA_WIDTH             | 16      | Width of CPU registers in bits                                    |
| DEF_CORE_HAS_INSTR_MEM | //      | Enable a per core instruction memory cache                        |
| DEF_MEM_INSTR_DEPTH    | 64      | Instruction memory cache per core                                 |
| DEF_MEM_SCRATCH_DEPTH  | 64      | RW RAM per core                                                   |
| DEF_ALU_HW_MULT        | 1       | Enable/disable HW multiply (1 clock)                              |
| FIX_T3                 | //      | Enable a T3 state for the APB transaction                         |
| DEF_GLOBAL_RESET       | //      | Enable synchronous reset logic                                    |
| DEF_USE_REPROG         | //      | Programme instruction memory via UART0. Requires DEF_GLOBAL_RESET |

Table B.2: Core Options

## **B.3** Peripheral Options

| Macro           | Default  | Purpose                                             |
|-----------------|----------|-----------------------------------------------------|
| APB_WIDTH       |          | AMBA APB PADDR signal width                         |
| APB_PSELX_GPIO0 | 0        | GPIO0 index                                         |
| APB_PSELX_UART0 | 1        | UART0 index                                         |
| APB_PSELX_REGS0 | 2        | REGS0 index                                         |
| APB_PSELX_BRAM0 | 3        | BRAM0 index                                         |
| APB_PSELX_GPIO1 | 4        | GPIO1 index                                         |
| APB_PSELX_GPIO2 | 5        | GPIO2 index                                         |
| APB_PSELX_TIMR0 | 6        | TIMR0 index                                         |
| APB_BRAM0_CELLS | 4096     | Shared memory words                                 |
| DEF_MMU_TIM0_S  | 16'h0000 | Per core scratch memory start/end address           |
| DEF_MMU_TIM0_E  | 16'h007F | "                                                   |
| DEF_MMU_SREG_S  | 16'h0080 | Per core special registers start/end address        |
| DEF_MMU_SREG_E  | 16'h008F | "                                                   |
| DEF_MMU_GPIO0_S | 16'h0090 | Shared GPIOn start/end address                      |
| DEF_MMU_GPIO0_E | 16'h0090 | n,                                                  |
| DEF_MMU_GPIO1_S | 16'h0091 | n,                                                  |
| DEF_MMU_GPIO1_E | 16'h0091 | "                                                   |
| DEF_MMU_GPIO2_S | 16'h0092 | "                                                   |
| DEF_MMU_GPIO2_E | 16'h0092 | "                                                   |
| DEF_MMU_UART0_S | 16'h00A0 | Shared UART start/end address                       |
| DEF_MMU_UART0_E | 16'h00A1 | "                                                   |
| DEF_MMU_REGS0_S | 16'h00B0 | Shared registers start/end address                  |
| DEF_MMU_REGS0_E | 16'h00B7 | "                                                   |
| DEF_MMU_BRAM0_S | 16'h1000 | Shared memory with global monitor start/end address |
| DEF_MMU_BRAM0_E | 16'h1FFF | "                                                   |
| DEF_MMU_TIMR0_S | 16'h0200 | Shared timer peripheral start/end address           |
| DEF_MMU_TIMR0_E | 16'h0202 | "                                                   |

 Table B.3: Peripheral Options

# Appendix C

# **Code Listing**

| <b>C</b> .1 | vmicro16_soc_config.v | 58 |
|-------------|-----------------------|----|
| C.2         | top_ms.v              | 60 |
| <b>C</b> .3 | vmicro16_soc.v        | 61 |
| <b>C</b> .4 | vmicro16_periph.v     | 67 |
| C.5         | vmicro16.v            | 73 |

### C.1 vmicro16\_soc\_config.v

Configuration file for configuring the vmicro16\_soc.v and vmicro16.v features.

```
// Configuration defines for the umicro16_soc and umicro16 cpu.
          `ifndef VMICRO16_SOC_CONFIG_H
`define VMICRO16_SOC_CONFIG_H
3
4
5
          `include "clog2.v"
          `define FORMAL
          `define CORES
`define SLAVES
10
11
          13
         14
15
         // Per core instruction memory
// Set this to give each core its own instruction memory cache
`define DEF_CORE_HAS_INSTR_MEM
16
18
19
         // Top level data width for registers, memory cells, bus widths `define {\it DATA\_WIDTH} 16
20
21
22
         // Set this to use a workaround for the MMU's APB T2 clock //`define {\it FIX\_T3}
23
25
         // Instruction memory (read only)
// Must be large enough to support software program.
ifdef DEF_CORE_HAS_INSTR_MEM
26
27
29
30
                // 64 16-bit words per core
`define DEF_MEM_INSTR_DEPTH 64
31
                // 4096 16-bit words global
`<mark>define DEF_MEM_INSTR_DEPTH 4096</mark>
33
34
35
         `endif
36
37
38
          // Scratch memory (read/write) on each core.
// See `DEF_MMU_TIMO_* defines for info.
`define DEF_MEM_SCRATCH_DEPTH 64
         // Enables hardware multiplier and mult rr instruction `define DEF_ALU_HW_MULT 1 \,
40
41
         // Enables global reset (requires more luts) // define \it DEF\_GLOBAL\_RESET
```

```
45
          // Enable a watch dog timer to reset the soc if threadlocked // `define \it DEF\_USE\_WATCHDOG
46
47
48
49
          // Enables instruction memory programming via UARTO
          //`define DEF_USE_REPROG
50
51
          `ifdef DEF_USE_REPROG
52
                `ifndef DEF_GLOBAL_RESET
`error_DEF_USE_REPROG_requires_DEF_GLOBAL_RESET
53
54
55
                 endif
          `endif
56
57
          58
          59
60
61
62
           `define APB_PSELX_GPI00 0
63
           define APB_PSELX_UARTO 1
define APB_PSELX_REGSO 2
64
65
           define APB_PSELX_BRAMO 3
66
67
           `define APB_PSELX_GPI01
           define APB_PSELX_GPI01 4
define APB_PSELX_GPI02 5
define APB_PSELX_TIMR0 6
68
69
70
          `define APB_PSELX_WDOGO 7
71
          define APB_GPI00_PINS 8
define APB_GPI01_PINS 16
define APB_GPI02_PINS 8
72
73
74
75
          // Shared memory words
`define APB_BRAMO_CELLS 4096
76
77
78
79
          80
81
          // TIMO
// Number of scratch memory cells per core
define DEF_MMU_TIMO_CELLS 64
define DEF_MMU_TIMO_S 16'h0000
define DEF_MMU_TIMO_E 16'h007F
// SREG
82
83
84
85
86
87
          `define DEF_MMU_SREG_S
`define DEF_MMU_SREG_E
                                                 16'h0080
88
89
                                                 16'h008F
          // GPI00
90
          `define DEF_MMU_GPIOO_S
`define DEF_MMU_GPIOO_E
                                                 16'h0090
91
92
93
          // GPI01
          `define DEF_MMU_GPI01_S
`define DEF_MMU_GPI01_E
                                                 16'h0091
94
95
                                                 16'h0091
          // GPI02
96
          `define DEF_MMU_GPI02_S
`define DEF_MMU_GPI02_E
                                                 16'h0092
97
                                                 16'h0092
98
          // UARTO
`define DEF_MMU_UARTO_S
`define DEF_MMU_UARTO_E
100
                                                 16'h00A0
                                                 16'h00A1
101
          // REGSO
102
          `define DEF_MMU_REGSO_S
`define DEF_MMU_REGSO_E
// WDOGO
                                                 16'h00B0
103
104
                                                 16'h00B7
105
           `define DEF_MMU_WDOGO_S
`define DEF_MMU_WDOGO_E
106
107
                                                 16'h00B8
          // BRAMO
108
           define DEF_MMU_BRAMO_S
                                                 16'h1000
109
          `define DEF_MMU_BRAMO_E
                                                 16'h1fff
111
          // TIMRO
          `define DEF_MMU_TIMRO_S
`define DEF_MMU_TIMRO_E
                                                 16'h0200
112
                                                 16'h0202
113
114
          115
          // Interrupts
116
117
          // Enable/disable interrupts
// Disabling will free up
118
           // Disabling will free up resources for other features define DEF_ENABLE_INT
119
120
          'define DEF_ENABLE_INT'
// Number of interrupt in signals
'define DEF_NUM_INT 8
// Default interrupt bitmask (0 = hidden, 1 = enabled)
'define DEF_INT_MASK 0
// Bit position of the TIMRO interrupt signal
'define DEF_INT_TIMRO 0
// Interrupt signal
121
122
123
124
125
126
          // Interrupt vector memory location
`define DEF_MMU_INTSV_S 16'h0100
`define DEF_MMU_INTSV_E 16'h010'
127
                                             16'h0100
128
129
          // Interrupt vector memory location
`define DEF_MMU_INTSM_S 16'h0100
`define DEF_MMU_INTSM_E 16'h0100
130
                                              16'h0108
131
                                                 16'h0108
132
133
134
```

135

### C.2 top\_ms.v

`endif

Top level module that connects the SoC design to hardware pins on the FPGA.

```
module seven_display # (
    parameter INVERT = 1
             ) (
 3
                     input [3:0] n,
output [6:0] segments
 4
 5
 6
7
                      reg [6:0] bits;
                     assign segments = (INVERT ? ~bits : bits);
10
                      always @(n)
                     case (n)
4'h0: bits = 7'b0111111; // 0
4'h1: bits = 7'b000110; // 1
11
12
13
                             4'h1: bits = 7'b0000110; // 1
4'h2: bits = 7'b1011011; // 2
4'h3: bits = 7'b100111; // 3
4'h4: bits = 7'b1100110; // 4
4'h5: bits = 7'b1101101; // 5
14
15
16
                             4'h5: bits = 7'b1101101; // 5
4'h6: bits = 7'b1111101; // 6
4'h7: bits = 7'b0000111; // 7
4'h8: bits = 7'b1111111; // 8
4'h9: bits = 7'b1100111; // 9
4'hA: bits = 7'b1110111; // A
4'hB: bits = 7'b111100; // B
4'hC: bits = 7'b111100; // C
4'hD: bits = 7'b1111001; // C
4'hE: bits = 7'b1111001; // E
4'hF: bits = 7'b1111001; // F
Case
17
18
19
21
22
23
24
25
26
27
28
                     endcase
29
             endmodule
30
31
32
              // minispartan6+ XC6SLX9
             module top_ms # (
    parameter GPIO_PINS = 8
) (
33
34
35
36
                      input
                                                      CLK50,
                     input [3:0]
// UART
37
                                                      SW,
38
39
                      input
                                                      RXD,
40
                      output
                     // Peripherals
output [7:0]
41
                                                      LEDS.
42
43
                      // 3v3 input from the s6 on the delsoc
                                                      S6_3v3,
45
                     input
\frac{46}{47}
                      // SSDs
                     output [6:0] ssd0,
output [6:0] ssd1,
output [6:0] ssd2,
output [6:0] ssd3,
output [6:0] ssd4,
output [6:0] ssd5
48
49
50
51
52
53
54
55
                      //wire [15:0]
                                                                 M PADDR:
                                                                 M_PWRITE;
56
                      //wire
                      //wire [5-1:0]
                                                                  M_PSELx;
                                                                                      // not shared
                                                                 M_PENABLE:
58
                      //wire
                      //wire [15:0]
                                                                 M_PWDATA;
M_PRDATA; // input to intercon
M_PREADY; // input to intercon
59
                       //wire [15:0]
60
                      //wire
61
                     wire [7:0] gpio0;
wire [15:0] gpio1;
63
64
                     wire [7:0] gpio2;
65
67
68
                     vmicro16_soc soc (
    .clk (CLK50)
                                             (~SW[0]),
69
                              .reset
70
                                                         (M_PADDR),
(M_PWRITE),
71
72
                              //.M PADDR
                             //.M_PSELx (M_PSELx),
//.M_PENABLE (M_PENABLE),
//.M_PWDATA (M_PWDATA)
                              //.M_PWRITE
73
```

```
76
77
78
                                               //.M_PRDATA
                                                                                     (M_PRDATA),
(M_PREADY),
                                             //.M_PREADY
  80
                                              .uart_tx (TXD),
  81
                                             .uart_rx (RXD),
  82
  83
                                             .gpio0
                                                                         (LEDS[3:0]),
  84
  85
                                               .gpio1
                                                                         (gpio1),
                                                                        (gpio2),
  86
                                              .gpio2
  87
                                             // DBUG
                                                                      (LEDS[4])
                                              .dbug0
  89
                                              //.dbug1 (LEDS[7:4])
  90
                                 );
  91
  92
  93
                                  assign LEDS[7:5] = \{TXD, RXD, S6_3v3\};
                                // SSD displays (split across 2 gpio ports 1 and 2)
wire [3:0] ssd_chars [0:5];
assign ssd_chars[0] = gpio1[3:0];
assign ssd_chars[1] = gpio1[7:4];
assign ssd_chars[2] = gpio1[11:8];
assign ssd_chars[3] = gpio1[15:12];
assign ssd_chars[4] = gpio2[3:0];
assign ssd_chars[5] = gpio2[7:4];
assign ssd_chars[5] = gpio2[7:4];
seven_display ssd_0 (.n(ssd_chars[0]), .segments (ssd0));
seven_display ssd_1 (.n(ssd_chars[1]), .segments (ssd1));
seven_display ssd_2 (.n(ssd_chars[2]), .segments (ssd2));
seven_display ssd_4 (.n(ssd_chars[4]), .segments (ssd3));
seven_display ssd_5 (.n(ssd_chars[5]), .segments (ssd4));
seven_display ssd_5 (.n(ssd_chars[5]), .segments (ssd5));
  94
  95
  96
  97
  98
  99
100
101
102
103
104
105
106
107
108
109
110
                      endmodule
```

#### C.3 vmicro16\_soc.v

```
1
2
3
        `include "vmicro16_soc_config.v"
4
        include "clog2.v"
include "formal.v"
5
        module pow_reset # (
8
             parameter INIT = 1,
             parameter N
10
11
             input
12
                            clk,
13
             input
                            reset
14
             output reg resethold
15
             initial resethold = INIT ? (N-1) : 0;
16
             always @(*)
18
                  resethold = |hold;
19
20
             reg [`clog2(N)-1:0] hold = (N-1);
21
             always @(posedge clk)
if (reset)
22
23
                       hold <= N-1;
24
25
                  else
                      if (hold)
hold <= hold - 1;
26
27
28
        endmodule
29
        // Vmicro16 multi-core SoC with various peripherals
30
        // and interrupts
31
32
        module vmicro16_soc (
33
             input clk,
             input reset,
34
35
36
              // UARTO
37
38
             {\tt input}
                                                     uart_rx,
             output
                                                     uart_tx,
39
             output [`APB_GPI00_PINS-1:0]
output [`APB_GPI01_PINS-1:0]
output [`APB_GPI02_PINS-1:0]
                                                     gpio0,
40
41
                                                     gpio1,
42
                                                     gpio2,
43
             output
45
```

```
output
                               [`CORES-1:0]
[`CORES*8-1:0]
                                                            dbug0
47
                output
                                                            dbug1
48
          ):
                wire ['CORES-1:0] w_halt;
49
50
                assign halt = &w_halt;
51
52
                assign dbug0 = w_halt;
53
                // Watchdog reset pulse signal.
// Passed to pow_reset to generate a longer reset pulse
54
55
56
                wire wdreset;
57
               wire prog_prog;
58
59
                // soft register reset hold for brams and registers
               wire soft_reset;
60
                ifdef DEF_GLOBAL_RESET
61
62
                     pow_reset # (
                          INIT
                                            (1).
63
                           . N
                                            (8)
64
65
                     ) por_inst (
                           .clk (clk), ifdef DEF_USE_WATCHDOG
66
67
                           .reset
68
                                            (reset | wdreset | prog_prog),
69
                            else
                           .reset
70
                                            (reset).
71
                            endif
72
                           .resethold (soft_reset)
73
74
                     );
               `else
75
                     assign soft_reset = 0;
                `endif
76
77
78
               // Peripherals (master to slave)
wire [`APB_WIDTH-1:0]
79
                                                            M_PADDR;
                 wire [`SLAVES-1:0]
                                                            M_PSELx; // not shared
M_PENABLE;
81
82
                 wire
                 wire [`DATA_WIDTH-1:0] M_PWDATA;
wire [`SLAVES*`DATA_WIDTH-1:0] M_PRDATA; // input to intercon
wire [`SLAVES-1:0] M_PREADY; // input
83
84
85
86
               // Master apb interfaces
wire [`CORES*`APB_WIDTH-1:0]
wire [`CORES-1:0]
wire [`CORES-1:0]
87
88
                                                          w_PADDR;
89
                                                            w_PWRITE;
90
                                                            w_PSELx;
                wire [ CORES-1:0] w_PSELx;
wire [ CORES-1:0] w_PENABLE;
wire [ CORES* DATA_WIDTH-1:0] w_PWDATA;
wire [ CORES-1:0] w_PREADY;
91
92
93
94
95
          // Interrupts
ifdef DEF_ENABLE_INT
wire ['DEF_NUM_INT-1:0] ints;
wire ['DEF_NUM_INT*'DATA_WIDTH-1:0] ints_data;
96
97
98
99
               100
101
102
          `endif
103
104
105
                apb_intercon_s # (
                                            (`CORES)
                     .MASTER_PORTS
.SLAVE_PORTS
106
                                            (`SLAVES),
(`APB_WIDTH)
107
                      .BUS_WIDTH
108
                     .DATA_WIDTH
                                            ( DATA_WIDTH),
109
110
                      .HAS_PSELX_ADDR (1)
111
               ) apb (
                     .clk
                                      (clk),
(soft_reset),
112
                     .reset
113
                     // APB master to slave
.S_PADDR (w_PADDR),
114
115
                                      (w_PADDR)
                     .S_PWRITE (W_PWRITE),
.S_PSELx (W_PSELx),
.S_PENABLE (W_PENABLE)
116
117
118
                     .S_PWDATA
                                      (w_PWDATA),
119
                                      (w_PRDATA),
120
                      .S PRDATA
                                      (w_PREADY),
                     .S_PREADY
121
                     // shared bus
.M_PADDR (
122
123
                                      (M_PADDR)
124
                      .M_PWRITE
                                      (M_PWRITE),
                                     (M_PSELx),
(M_PENABLE),
(M_PWDATA),
125
                      .M PSELx
                      .M_PENABLE
126
                     M_PWDATA
127
128
                      .M_PRDATA
                                      (M_PRDATA),
129
                     .M_PREADY
                                      (M_PREADY)
130
131
          `ifdef DEF_USE_WATCHDOG
               vmicro16_watchdog_apb # (
    .BUS_WIDTH (`APB_WIDTH),
133
134
```

```
135
                        .NAME
                                         ("WDOGO")
136
                 ) wdog0_apb (
137
                       .clk
                                         (clk).
138
                        .reset
                                 slave to master

OR (),

ITE (M_PWRITE),

Lx (M_PSELx[`APB_PSELX_WDOGO]),

'M PENABLE),
                                         (),
                       // apb s
139
140
                        .S PWRITE
141
                        .S_PSELx
142
                        .S_PENABLE
143
144
                        .S_PWDATA
                        .S_PRDATA
                                         (),
(M_PREADY[`APB_PSELX_WDOGO]),
145
                       .S_PREADY
146
147
148
                        .wdreset
                                         (wdreset)
           );
`endif
149
150
151
                 vmicro16_gpio_apb # (
    .BUS_WIDTH (`APB_WIDTH),
    .DATA_WIDTH (`DATA_WIDTH)
152
153
154
155
                        .PORTS
                                         (`APB_GPIOO_PINS),
156
                        .NAME
                                         ("GPI00")
157
                 ) gpio0_apb (
                                         (clk),
158
                       .clk
                                         (soft_reset),
159
                        .reset
                       // apb slave to master interface
.S_PADDR (M_PADDR),
160
161
                                         (M_PWRITE),
(M_PSELx[`APB_PSELX_GPI00]),
(M_PENABLE),
162
                        .S PWRITE
                        .S_PSELx
163
                       .S_PENABLE
164
                                         (M_PWDATA),
(M_PWDATA),
(M_PRDATA[^APB_PSELX_GPIOO*^DATA_WIDTH +: `DATA_WIDTH]),
(M_PREADY[^APB_PSELX_GPIOO]),
165
                        .S_PWDATA
166
                        .S_PRDATA
167
                        .S PREADY
                                         (gpio0)
168
                        .gpio
169
                 );
170
                 // GPI01 for Seven segment displays (16 pin)
vmicro16_gpio_apb # (
    .BUS_WIDTH (`APB_WIDTH),
171
172
                        .BUS_WIDTH (`APB_WIDTH),
.DATA_WIDTH (`DATA_WIDTH)
173
174
175
                        . PORTS
                                         ( APB_GPIO1_PINS),
176
                        . NAME
                                         ("GPI01")
177
                 ) gpio1_apb (
178
                       .clk
                                         (clk),
179
                        .reset
                                         (soft_reset),
                       // apb slave to master interface
.S_PADDR (M_PADDR),
180
                                         (M_PADDR),
(M_PWRITE)
181
                        .S_PWRITE
182
                                          (M_PSELx[`APB_PSELX_GPI01]),
183
                        .S_PSELx
                                         (M_PENABLE),
(M_PUDATA),
(M_PRDATA[^APB_PSELX_GPI01*^DATA_WIDTH +: `DATA_WIDTH]),
(M_PREADY[^APB_PSELX_GPI01]),
                        .S_PENABLE
184
                        S_PWDATA
185
                        .S_PRDATA
186
                       .S_PREADY
187
                                         (gpio1)
188
                        .gpio
189
                 ):
190
191
                 // GPI02 for Seven segment displays (8 pin)
                 vmicro16_gpio_apb # (
    .BUS_WIDTH ( `APB_WIDTH),
    .DATA_WIDTH ( `DATA_WIDTH),
    .PORTS ( `APB_GPIO2_PINS),
192
193
194
195
                        .NAME
                                         ("GPI02")
196
197
                 ) gpio2_apb (
198
                       .clk
                                         (clk),
                       .reset (soft_reset),
// apb slave to master in
199
200
                                                         interface
                       .S_PADDR
                                         (M_PADDR),
201
202
                       .S_PWRITE
.S PSELx
                                         (M_PWRITE),
(M_PSELx[`APB_PSELX_GPI02]),
203
                                         (M_PENABLE),
                        .S_PENABLE
204
                                         (M_FENADLA),
(M_PWDATA),
(M_PRDATA[`APB_PSELX_GPI02*`DATA_WIDTH +: `DATA_WIDTH]),
(M_PREADY[`APB_PSELX_GPI02]),
                       .S_PWDATA
205
206
                        .S_PRDATA
207
                        .S PREADY
208
                        .gpio
                 );
209
210
                 apb_uart_tx # (
   .DATA_WIDTH (8),
   .ADDR_EXP (4) //2^^4 = 16 FIF0 words
211
212
213
214
                 ) uart0_apb (
215
                       .clk
                                         (clk),
                        .reset
216
                                         (soft_reset),
                       // apb slave to master interface
.S_PADDR (M_PADDR),
217
218
                        .S_PWRITE
                                         (M_PWRITE)
219
                                         (M_PSELx[`APB_PSELX_UARTO]),
220
                        .S PSELx
                       .S_PENABLE
.S_PWDATA
                                         (M_PENABLE),
(M_PWDATA),
(M_PRDATA['APB_PSELX_UARTO*'DATA_WIDTH +: 'DATA_WIDTH]),
221
222
                        .S_PRDATA
```

```
.S_PREADY
                                          (M_PREADY[`APB_PSELX_UARTO]),
224
225
                        // uart wires
226
                        .tx wire
                                           (uart tx).
227
                        .rx_wire
228
229
                 timer_apb timr0 (
    .clk (clk),
230
231
232
                        .reset
                                           (soft_reset),
                        // apb slave to master interface
.S_PADDR (M_PADDR),
233
                                          (M_PADDR),
(M_PWRITE)
234
                        .S_PWRITE
235
                                          (M_PSELx[`APB_PSELX_TIMRO]),
(M_PENABLE),
236
                        .S_PSELx
                        .S_PENABLE
237
                                           (M_PWDATA),
(M_PRDATA[^APB_PSELX_TIMRO*`DATA_WIDTH +: `DATA_WIDTH]),
(M_PREADY[^APB_PSELX_TIMRO])
                        .S_PWDATA
238
                        .S PRDATA
239
240
                        .S_PREADY
241
                         ifdef DEF_ENABLE_INT
242
                                          (ints ['DEF_INT_TIMRO]),
(ints_data['DEF_INT_TIMRO*'DATA_WIDTH +: 'DATA_WIDTH])
                        ,.out
243
                          .int_data
244
245
246
                 );
247
                 // Shared register set for system-on-chip info
// RO = number of cores
vmicro16_regs_apb # (
    .BUS_WIDTH ( APB_WIDTH),
    .DATA_WIDTH ( DATA_WIDTH),
248
249
250
251
252
                        .CELL_DEPTH
                                                       (8),
253
                        .PARAM_DEFAULTS_RO (`CORES),
.PARAM_DEFAULTS_R1 (`SLAVES)
254
255
                 ) regs0_apb (
256
257
                        .clk
                                           (clk),
                                           (soft_reset),
258
                        .reset
                        // apb slave to master interface
.S_PADDR (M_PADDR),
.S_PWRITE (M_PWRITE),
259
260
261
                                           (M_PSELx[`APB_PSELX_REGSO]),
262
                        .S_PSELx
                        .S_PENABLE
                                           (M_PENABLE),
263
                                          (M_PWDATA),
(M_PRDATA[^APB_PSELX_REGSO*`DATA_WIDTH +: `DATA_WIDTH]),
(M_PREADY[^APB_PSELX_REGSO])
264
                        .S PWDATA
265
                        .S PRDATA
266
                        .S_PREADY
267
268
                 vmicro16_bram_ex_apb # (
    .BUS_WIDTH ( `APB_WIDTH) ,
    .MEM_WIDTH ( `DATA_WIDTH) ,
269
270
271
272
                        .MEM_DEPTH
                                              ( APB_BRAMO_CELLS),
                        .CORE_ID_BITS (`clog2(`CORES))
273
274
                 ) bram_apb (
275
                                           (clk),
                        .clk
                                           (soft_reset),
276
                        .reset
                        // apb slave to master interface
.S_PADDR (M_PADDR),
.S_PWRITE (M_PWRITE),
.S_PSELX (M_PSELX[`APB_PSELX_BRAM0]),
277
278
279
280
                        .S_PENABLE
                                           (M_PENABLE),
281
                                          (M_PWDATA),
(M_PRDATA[^APB_PSELX_BRAMO*`DATA_WIDTH +: `DATA_WIDTH]),
(M_PREADY[^APB_PSELX_BRAMO])
282
                        .S PWDATA
                        .S_PRDATA
283
                        .S_PREADY
284
285
286
                  // There must be atleast 1 core
`static_assert(`CORES > 0)
`static_assert(`DEF_MEM_INSTR_DEPTH > 0)
287
288
289
290
                  `static_assert(`DEF_MMU_TIMO_CELLS > 0)
291
292
            // Single instruction memory
ifndef DEF_CORE_HAS_INSTR_MEM
// slave input/outputs from interconnect
293
294
295
                  wire [`APB_WIDTH-1:0]
296
                                                                  instr_M_PADDR;
297
                                                                  instr_M_PWRITÉ;
                  wire
                                                                  instr_M_PSELx;
instr_M_PENABLE;
298
                  wire [1-1:0]
                                                                                           // not shared
299
                  wire
                  wire ['DATA_WIDTH-1:0]
                                                                  instr_M_PWDATA;
instr_M_PRDATA; // slave response
instr_M_PREADY; // slave response
300
                  wire [1*`DATA_WIDTH-1:0]
301
302
                  wire [1-1:0]
303
                 // Master apb interfaces
wire [`CORES*`APB_WIDTH-1:0]
304
                                                                  instr_w_PADDR;
305
                         [ CORES-1:0]
[ CORES-1:0]
306
                  wire
                                                                  instr_w_PWRITÉ;
                                                                  instr_w_PSELx;
307
                  wire
                                                                  instr_w_PENABLE;
instr_w_PWDATA;
                          [ CORES-1:0]
308
                  wire
                 wire ['CORES*'DATA_WIDTH-1:0]
wire ['CORES*'DATA_WIDTH-1:0]
wire ['CORES-1:0]
309
310
                                                                  instr_w_PRDATA;
311
                                                                  instr_w_PREADY;
312
```

```
`ifdef DEF_USE_REPROG
  wire [`clog2(`DEF_MEM_INSTR_DEPTH)-1:0] prog_addr;
  wire [`DATA_WIDTH-1:0] prog_data;
313
314
315
316
                     wire prog_we;
317
                     uart_prog rom_prog (
                                           (clk)
318
                          .clk
                                           (reset | wdreset),
319
                          .reset
320
                          // input stream
321
                          .uart_rx
                                           (uart_rx),
                          // programmer .addr (
322
                                           (prog_addr),
(prog_data),
323
324
                          .data
325
                                           (prog_we),
                          .we
326
                          .prog
                                           (prog_prog)
327
               `endif
328
329
               `ifdef DEF_USE_REPROG
330
                    vmicro16_bram_prog_apb
331
332
333
                    vmicro16_bram_apb
                endif
334
335
               # (
336
                     .BUS_WIDTH
                                           (`APB_WIDTH),
                                           (`DATA_WIDTH),
337
                     .MEM_WIDTH
                     .MEM_DEPTH .USE_INITS
                                           (`DEF_MEM_INSTR_DEPTH),
338
339
                                           (1).
                                           ("INSTR_ROM_G")
                     .NAME
340
341
               ) instr_rom_apb (
342
                     .clk
                                           (clk),
                                           (reset),
343
                     .reset
                     .S_PADDR
                                           (instr_M_PADDR),
344
                     .S_PWRITE
345
                                           (0),
                                           (instr_M_PSELx),
(instr_M_PENABLE),
                     .S_PSELx
346
                     S_PENABLE
347
                     .S_PWDATA
                                           (0).
348
                     .S_PRDATA
                                           (instr_M_PRDATA),
349
350
                     .S_PREADY
                                           (instr_M_PREADY)
351
                     `ifdef DEF_USE_REPROG
352
353
354
                          .addr
                                          (prog_addr),
355
                          .data
                                          (prog_data),
356
                          .we
                                          (prog_we),
357
                     .prog
endif
                                         (prog_prog)
358
359
               );
360
361
               apb_intercon_s # (
                                          (`CORES),
(1),
(`APB_WIDTH),
(`DATA_WIDTH),
                     .MASTER_PORTS
.SLAVE_PORTS
362
363
                     .BUS_WIDTH
.DATA_WIDTH
364
365
                     .HAS_PSELX_ADDR (0)
366
367
               ) apb_instr_intercon (
368
                     .clk
                                     (clk),
                     .reset (soft_reset)
// APB master from cores
// master
369
                                     (soft_reset),
370
371
                     .S_PADDR
372
                                     (instr_w_PADDR)
373
                     .S PWRITE
                                     (instr_w_PWRITE),
(instr_w_PSELx),
                     .S_PSELx
374
375
                     .S_PENABLE
                                     (instr_w_PENABLÉ),
376
                     .S_PWDATA
                                     (instr_w_PWDATA),
377
                     .S PRDATA
                                     (instr_w_PRDATA),
                                     (instr_w_PREADY),
                     .S_PREADY
378
                     // shared bus slaves
379
380
                        slave outputs
                                     (instr_M_PADDR),
(instr_M_PWRITE),
(instr_M_PSELx),
(instr_M_PENABLE),
                     .M_PADDR
381
                     .M PWRITE
382
                     .M PSELx
383
                     .M_PENABLE
384
                                     (instr_M_PWDATA),
(instr_M_PRDATA),
(instr_M_PREADY)
385
                     .M_PWDATA
386
                     .M PRDATA
                     .M_PREADY
387
         );
`endif
388
389
390
               genvar i;
391
               generate for(i = 0; i < `CORES; i = i + 1) begin : cores</pre>
392
393
                    vmicro16_core # (
    .CORE_ID
394
                                                      (i).
395
                                                     (`DATA_WIDTH),
                          .DATA_WIDTH
396
397
                          .MEM_INSTR_DEPTH ('DEF_MEM_INSTR_DEPTH),
.MEM_SCRATCH_DEPTH ('DEF_MMU_TIMO_CELLS)
398
399
400
                    ) c1 (
401
                          .clk
                                           (clk),
```

```
402
                         .reset
                                         (soft_reset),
403
                         // debug
404
405
                         .halt
                                         (w_halt[i]),
406
                         // interrupts
407
                                         (ints),
                         .ints
408
409
                         .ints_data (ints_data),
410
                         // Output master port 1
.w_PADDR (w_PADDR
411
                                                       [ APB_WIDTH*i +: APB_WIDTH]
412
                                         (w_PWRITE
                                                       [i]
                         .w_PWRITE
413
                         .w_PSELx
                                         (w_PSELx
414
                         .w_PENABLE (w_PENABLE [i]
415
                                         (w_PWDATA [`DATA_WIDTH*i +: `DATA_WIDTH]),
(w_PRDATA [`DATA_WIDTH*i +: `DATA_WIDTH]),
                         .w_PWDATA
416
                         .w_PRDATA
417
                                                     [i]
418
                         .w_PREADY
                                         (w_PREADY
419
          `ifndef DEF_CORE_HAS_INSTR_MEM
420
421
                         // APB instruction rom
422
                            // Output master port 2
                         .w2_PADDR (instr_w_PADDR [`APB_WIDTH*i +: `APB_WIDTH]
//.w2_PWRITE (instr_w_PWRITE [i]
.w2_PSELx (instr_w_PSELx [i]
423
424
425
426
                         .w2_PENABLE (instr_w_PENABLE [i]
                         //.w2_PWDATA (instr_w_PWDATA [`DATA_WIDTH*i +: `DATA_WIDTH]),
.w2_PRDATA (instr_w_PRDATA [`DATA_WIDTH*i +: `DATA_WIDTH]),
.w2_PREADY (instr_w_PREADY [i] )
427
428
429
          `endif
430
431
432
               end
              endgenerate
433
434
435
               436
               // Formal Verification
437
               438
439
              wire all_halted = &w_halt;
440
441
               // Count number of clocks each core is spending on
442
              bus transactions
/// bus core_times [0: CORES-1];
reg [15:0] core_work_times [0: CORES-1];
443
444
445
446
               reg [15:0] instr_fetch_times
                                                        [0: CORES-1];
447
448
               integer i2;
449
               initial
                    for(i2 = 0; i2 < `CORES; i2 = i2 + 1) begin
  bus_core_times[i2] = 0;
  core_work_times[i2] = 0;</pre>
450
451
452
                    end
453
454
455
               // total bus time
456
               generate
                   457
458
459
460
                                          bus_core_times[g2] <= bus_core_times[g2] + 1;</pre>
461
462
                                   // Core working time
`ifndef DEF_CORE_HAS_INSTR_MEM
if (!w_PSELx[g2] && !instr_w_PSELx[g2])
463
464
465
                                    `else
466
467
                                          if (!w_PSELx[g2])
                                    `endif
468
                                                if (!w_halt[g2])
469
470
                                                        core_work_times[g2] <= core_work_times[g2] + 1;</pre>
471
472
                            end
473
                      end
474
               endgenerate
475
              reg [15:0] bus_time_average = 0;
reg [15:0] bus_reqs_average = 0;
reg [15:0] fetch_time_average = 0;
476
477
478
479
               reg [15:0] work_time_average = 0;
480
              always @(all_halted) begin
for (i2 = 0; i2 < `CORES; i2 = i2 + 1) begin
481
482
                         bus_time_average = bus_time_average bus_reqs_average = bus_reqs_average
                                                                           + bus_core_times[i2];
483
                                                                            + bus_core_reqs_count[i2];
484
                         work_time_average = work_time_average + core_work_times[i2];
fetch_time_average = fetch_time_average + instr_fetch_times[i2];
485
486
487
488
                    bus_time_average = bus_time_average / `CORES;
bus_reqs_average = bus_reqs_average / `CORES;
489
490
```

```
work_time_average = work_time_average / `CORES;
fetch_time_average = fetch_time_average / `CORES;
491
492
493
494
495
               // Count number of bus requests per core
496
497
              /// clock delay of w_PSELx
reg [`CORES-1:0] bus_core_reqs_last;
// rising edges of each
wire [`CORES-1:0] bus_core_reqs_real;
// storage for counters for each core
reg [15:0] bus_core_reqs_count [0:`CORES-1];
498
500
501
502
504
               initial
                   for(i2 = 0; i2 < `CORES; i2 = i2 + 1)
bus_core_reqs_count[i2] = 0;
505
506
508
               // 1 clk delay to detect rising edge
              always @(posedge clk)
bus_core_reqs_last <= w_PSELx;
509
510
              generate
512
513
                   514
515
516
                                                                                 bus_core_reqs_last[g3];
517
518
                           always @(posedge clk)
519
520
                                   if (bus_core_reqs_real[g3])
                                         bus_core_reqs_count[g3] <= bus_core_reqs_count[g3] + 1;</pre>
521
522
523
524
               endgenerate
525
526
               `ifndef DEF_CORE_HAS_INSTR_MEM
527
                    528
529
531
532
                    integer i3;
533
                    initial
                        for(i3 = 0; i3 < `CORES; i3 = i3 + 1)
535
                              instr_fetch_times[i3] = 0;
536
537
                    // total bus time
                    // Instruction fetches occur on the w2 master port
538
539
                    generate
                        genvar g4;
for (g4 = 0; g4 < `CORES; g4 = g4 + 1) begin : formal_for_fetch_times
    always @(posedge clk)
    if (instr_w_PSELx[g4])
    if first times[g4] <= instr fetch_times[g4] + 1;</pre>
540
541
542
543
                                        instr_fetch_times[g4] <= instr_fetch_times[g4] + 1;</pre>
544
545
                         end
546
                    endgenerate
               `endif
547
548
550
               `endif // end FORMAL
551
         endmodule
```

## C.4 vmicro16\_periph.v

Various memory-mapped APB peripherals, such as GPIO, UART, timers, and memory.

```
// Vmicro16 peripheral modules
2
        include "vmicro16_soc_config.v"
include "formal.v"
4
5
6
        // Simple watchdog peripheral
       module vmicro16_watchdog_apb # (
                                   H = 16,
= "WD"
            parameter BUS_WIDTH parameter NAME
8
9
            parameter CLK_HZ
                                     = 50_000_000
11
12
             input clk,
13
             input reset,
            // APB Slave to master interface
15
```

```
S_PADDR, // not used (optimised out)
                input [0:0]
17
                input
                                                            S_PWRITE,
S_PSELx,
18
                input
                                                            S_PENABLE,
19
                input
20
                input [0:0]
                                                            S_PWDATA,
21
22
                // prdata not used
23
                output [0:0]
                                                            S_PRDATA,
24
25
               output
                                                            S_PREADY.
26
                // watchdog reset, active high
27
               output reg
                                                            wdreset
28
               \label{eq:continuous} $$//assign S_PRDATA = (S_PSELx & S_PENABLE) ? gpio : 16'h0000; assign S_PREADY = (S_PSELx & S_PENABLE) ? 1'b1 : 1'b0; wire we = (S_PSELx & S_PENABLE & S_PWRITE);
29
30
31
32
33
                // countdown timer
               reg [`clog2(CLK_HZ)-1:0] timer = CLK_HZ;
34
35
36
37
               wire w_wdreset = (timer == 0);
               // infer a register to aid timing
initial wdreset = 0;
38
39
               always @(posedge clk)
   wdreset <= w_wdreset;</pre>
40
41
42
43
                always @(posedge clk)
44
                     if (we) begin
                          $display($time, "\t\%s <= RESET", NAME);
timer <= CLK_HZ;</pre>
45
46
47
                     end else begin
                          timer \leq timer - 1:
48
49
50
          endmodule
51
         module timer_apb # (
    parameter CLK_HZ = 50_000_000
) (
52
53
54
               input clk.
55
56
               input reset,
57
58
               input clk_en,
59
                // 0 16-bit value R/W
60
               // 1 16-bit control R
// 2 16-bit prescaler
input [1:0]
61
                                                 b0 = start, b1 = reset
62
                                                           S PADDR.
63
64
65
                                                            S_PWRITE,
                input
                                                            S_PSELx,
S_PENABLE,
66
                input
67
                input
                              [`DATA_WIDTH-1:0]
68
                input
69
                                                            S_PRDATA,
S_PREADY,
70
71
                output reg [`DATA_WIDTH-1:0]
                output
72
               output out,
output [`DATA_WIDTH-1:0] int_data
73
74
75
               76
77
78
79
80
81
                reg [`DATA_WIDTH-1:0] r_counter = 0;
               reg ['DATA_WIDTH-1:0] r_load = 0;
reg ['DATA_WIDTH-1:0] r_pres = 0;
reg ['DATA_WIDTH-1:0] r_ctrl = 0;
82
83
84
85
               localparam CTRL_START = 0;
localparam CTRL_RESET = 1;
localparam CTRL_INT = 2;
86
87
89
90
               localparam ADDR_LOAD = 2'b00;
               localparam ADDR_CTRL = 2'b01;
localparam ADDR_PRES = 2'b10;
92
93
               always @(*) begin
S_PRDATA = 0;
94
95
96
                     if (en)
                          case(S_PADDR)
97
                                ADDR_LOAD: S_PRDATA = r_counter;
ADDR_CTRL: S_PRDATA = r_ctrl;
98
99
                                //ADDR_CTRL: S_PRDATA = r_pres;
default: S_PRDATA = 0;
100
101
                           endcase
103
                end
104
                // prescaler counts from r_pres to 0, emitting a stb signal
105
```

```
// to enable the r_counter step
reg ['DATA_WIDTH-1:0] r_pres_counter = 0;
wire counter_en = (r_pres_counter == 0);
always @(posedge clk)
   if (r_pres_counter == 0)
        r_pres_counter <= r_pres;
else</pre>
106
107
108
109
110
111
                        else
112
113
                             r_pres_counter <= r_pres_counter - 1;</pre>
114
                 always @(posedge clk)
115
                       if (we)
116
                              case(S_PADDR)
117
                                    // Write to the load register:
// Set load register
// Set counter register
118
119
120
                                    ADDR_LOAD: begin
121
                                          r_load <= S_PWDATA;
r_counter <= S_PWDATA;
$display($time, "\ttimr0: WRITE LOAD: %h", S_PWDATA);
122
123
124
125
                                    ADDR_CTRL: begin
r_ctrl <= S_PWDATA;
126
127
                                          $\frac{1}{2} \text{stime}, \text{"\ttimr0: WRITE CTRL: \h", S_PWDATA);}
128
129
                                    end
                                    ADDR_PRES: begin
r_pres <= S_PWDATA;
130
131
132
                                          $display($time, "\t\ttimr0: WRITE PRES: %h", S_PWDATA);
133
                                    end
134
                              endcase
135
                       else
136
                              if (r_ctrl[CTRL_START]) begin
                                   if (r_counter == 0)
    r_counter <= r_load;
else if(counter_en)</pre>
137
138
139
                              r_counter <= r_counter -1;
end else if (r_ctrl[CTRL_RESET])
  r_counter <= r_load;</pre>
140
141
142
                 // generate the output pulse when r_counter == 0
// out = (counter reached zero & counter started)
assign out = (r_counter == 0) && r_ctrl[CTRL_START]; // && r_ctrl[CTRL_INT];
assign int_data = {`DATA_WIDTH{1'b1}};
144
145
146
147
           endmodule
149
150
           // APB wrapped programmable vmicro16_bram
           module vmicro16_bram_prog_apb # (
    parameter BUS_WIDTH = 16,
    parameter MEM_WIDTH = 16,
152
153
154
                                                     = 64,
155
                 parameter MEM_DEPTH
                 parameter APB_PADDR
parameter USE_INITS
                                                    = 0,
156
                                                    = 0,
= "BRAMPROG",
157
                 parameter NAME
                 parameter CORE_ID
                                                    = 0
159
           ) (
160
                 input clk,
161
                 input reset,
                 // APB Slave to master interface
163
                 input ['clog2(MEM_DEPTH)-1:0] S_PADDR,
164
165
                 input
                                                                   S_PSELx,
                 input
                                                                   S_PENABLE.
167
                 input
                 input [BUS_WIDTH-1:0]
                                                                   S PWDATA.
168
169
170
                 output [BUS_WIDTH-1:0]
                                                                   S_PRDATA,
                                                                   S_PREADY
171
                 output
172
                  // interface to program the instruction memory
input [`clog2(`DEF_MEM_INSTR_DEPTH)-1:0] addr,
input [`DATA_WIDTH-1:0] data,
173
174
                 input
175
                 input
176
                 input
                                                                                       we,
177
                 input
                                                                                       prog
178
           ):
                 wire [MEM_WIDTH-1:0] mem_out;
179
180
                 assign S_PRDATA = (S_PSELx & S_PENABLE) ? mem_out : 16'h0000; assign S_PREADY = (S_PSELx & S_PENABLE) ? 1'b1 : 1'b0; wire s_we = (S_PSELx & S_PENABLE & S_PWRITE);
181
182
183
184
                 185
186
187
188
189
                 vmicro16_bram # (
190
                       MEM_WIDTH
                                          (MEM_WIDTH),
                                          (MEM_DEPTH), ("BRAMPROG"),
191
                        .MEM_DEPTH
                        NAME.
192
                        .USE_INITS
193
                                          (0),
                        .CORE_ID
                 ) bram_apb (
```

```
196
                    .clk
                                   (clk),
                                   (reset).
197
                    .reset
198
                    .{\tt mem\_addr}
                                   (mem_addr),
199
200
                    .mem_in
                                   (mem_data),
                                   (mem_we), (mem_out)
201
                    .mem_we
202
                    .mem_out
203
         endmodule
204
205
          // APB wrapped vmicro16_bram
206
         module vmicro16_bram_apb # (
207
              parameter BUS_WIDTH
parameter MEM_WIDTH
parameter MEM_DEPTH
                                          = 16,
= 16,
208
209
                                            = 64,
210
211
              parameter APB_PADDR
                                           = 0,
                                           = 0,
= "BRAM",
              parameter USE_INITS parameter NAME
212
         parameter NAME parameter CORE_ID
213
                                           = 0
215
              input clk,
216
              input reset,
// APB Slave to master interface
217
218
219
               input ['clog2(MEM_DEPTH)-1:0] S_PADDR,
                                                       S_PWRITE,
S_PSELx,
220
               input
221
               input
222
               input
                                                       S_PENABLE,
              input [BUS_WIDTH-1:0]
223
                                                       S PWDATA.
224
225
               output [BUS_WIDTH-1:0]
                                                       S_PRDATA,
226
               output
                                                       S_PREADY
         ):
227
              wire [MEM_WIDTH-1:0] mem_out;
228
229
              230
231
232
233
234
                   if (S_PSELx && S_PENABLE)  
$display($time, "\t\t"s => %h", NAME, mem_out);
235
236
237
238
               always @(posedge clk)
239
                   if (we)
                        240
241
242
243
               vmicro16_bram # (
                   .MEM_WIDTH (MEM_WIDTH),
.MEM_DEPTH (MEM_DEPTH),
244
245
                    NAME
                                   (NAME),
246
                    .USE_INITS
247
                                   (1),
248
                    .CORE_ID
                                   (-1)
              ) bram_apb (
249
                                   (clk),
250
                   .clk
251
                    .reset
                                   (reset),
252
                                   (S_PADDR), (S_PWDATA),
                    . {\tt mem\_addr}
253
254
                    .{\tt mem\_in}
255
                    .mem_we
256
                    .mem_out
                                   (mem_out)
         );
endmodule
257
258
259
         // Shared memory with hardware monitor (LWEX/SWEX)
module vmicro16_bram_ex_apb # (
    parameter BUS_WIDTH = 16,
    parameter MEM_WIDTH = 16,
260
261
262
263
                                           = 64,
              parameter MEM_DEPTH = 64,
parameter CORE_ID_BITS = 3,
parameter SWEX_SUCCESS = 16'h0000,
264
265
266
              parameter SWEX_FAIL
                                          = 16'h0001
267
         ) (
268
269
               input clk,
270
               input reset,
271
272
               // |19 |18 |16 |15
// | LWEX | SWEX | 3 bit CORE_ID |
                                                               S_PADDR /
273
              input [`APB_WIDTH-1:0]
274
                                                       S_PADDR,
275
276
                                                       S_PWRITE,
               input
277
               input
                                                       S_PSELx,
                                                       S_PENABLE,
278
              input [MEM_WIDTH-1:0]
                                                       S PWDATA.
279
280
               output reg [MEM_WIDTH-1:0]
                                                       S_PRDATA,
281
282
               output
                                                       S PREADY
         );
283
              // exclusive flag checks
wire [MEM_WIDTH-1:0] mem_out;
284
```

```
286
                                         swex_success = 0;
              reg
287
              localparam ADDR_BITS = `clog2(MEM_DEPTH);
288
289
290
              // hack to create a 1 clock delay to S_PREADY
              // for bram to be ready
reg cdelay = 1;
291
292
              always @(posedge clk)
293
294
                   if (S_PSELx)
                        cdelay <= 0;</pre>
295
296
                   else
297
                        cdelay <= 1;</pre>
298
              //assign S_PRDATA = (S_PSELx & S_PENABLE) ? swex_success ? 16'hF0F0 : 16'h0000; assign S_PREADY = (S_PSELx & S_PENABLE & (!cdelay)) ? 1'b1 : 1'b0;
299
300
301
                                 = (S_PSELx & S_PENABLE & S_PWRITE);
              assign we
302
              wire
                      en
                                 = (S_PSELx & S_PENABLE);
303
              // Similar to:
304
305
                  http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204f/Cihbghef.html
306
              // mem_wd is the CORE_ID sent in bits [18:16]
localparam TOP_BIT_INDEX = `APB_WIDTH -1;
localparam PADDR_CORE_ID_MSB = TOP_BIT_INDEX - 2;
307
308
309
              localparam PADDR_CORE_ID_LSB
                                                       = PADDR_CORE_ID_MSB - (CORE_ID_BITS-1);
310
311
              // [LWEX, CORE_ID, mem_addr] from S_PADDR
312
313
              wire
                                           lwex = S_PADDR[TOP_BIT_INDEX];
swex = S_PADDR[TOP_BIT_INDEX-1];
core_id = S_PADDR[PADDR_CORE_ID_MSB:PADDR_CORE_ID_LSB];
314
              wire
              wire [CORE_ID_BITS-1:0] core_id
315
              // CORE_ID to write to ex_flags register
316
317
              wire [ADDR_BITS-1:0]
                                            mem_addr
                                                           = S_PADDR[ADDR_BITS-1:0];
              wire [CORE_ID_BITS:0] ex_flags_read;
is locked = |ex_flags_read;
318
319
320
                                            is_locked_self = is_locked && (core_id == (ex_flags_read-1));
321
322
323
              // Check exclusive access flags
              always Q(*) begin
324
325
                   swex_success = 0;
326
                   if (en)
                        // bug!
if (!swex && !lwex)
327
328
329
                             swex_success = 1;
                        else if (swex)
if (is_locked && !is_locked_self)
// someone else has locked it
swex_success = 0;
330
331
332
333
                             else if (is_locked && is_locked_self)
334
335
                                  swex_success = 1;
              end
336
337
338
              always @(*)
339
                   if (swex)
340
                        if (swex_success)
341
                             S_PRDATA = SWEX_SUCCESS;
342
                             S_PRDATA = SWEX_FAIL;
343
344
                   else
345
                        S_PRDATA = mem_out;
346
              wire reg_we = en && ((lwex && !is_locked)
347
                                    || (swex && swex_success));
348
349
              reg [CORE_ID_BITS:0] reg_wd;
always @(*) begin
350
351
                   reg_wd = {{CORE_ID_BITS}{1'b0}};
352
353
354
                   if (en)
                            if wanting to lock the addr
355
                        if (lwex)
356
357
                              // and not already locked
358
                             if (!is_locked) begin
                                  reg_wd = (core_id + 1);
359
                             end
360
                        else if (swex)
361
                             if (is_locked && is_locked_self)
    reg_wd = {{CORE_ID_BITS}{1'b0}};
362
363
              end
364
366
              // Exclusive flag for each memory cell
              vmicro16_bram # (
367
                    .MEM_WIDTH
                                  (CORE_ID_BITS + 1),
368
369
                    .MEM_DEPTH
                                   (MEM_DEPTH),
370
                    .USE_INITS
                                   (0),
371
                   NAME.
                                  ("rexram")
              ) ram_exflags (
372
373
                   .clk
                                  (clk),
374
                   .reset
                                  (reset),
```

```
375
                                      (mem_addr),
376
                     .mem_addr
                                      (reg_wd), (reg_we),
377
                     .{\tt mem\_in}
378
                     .mem_we
379
                                      (ex_flags_read)
                     .mem_out
               ):
380
381
                always @(*)
382
383
                     if (S_PSELx && S_PENABLE)
                           $display($time, "\t\tBRAMex[%h] READ %h\tCORE: %h",
    mem_addr, mem_out, S_PADDR[16 +: CORE_ID_BITS]);
384
385
386
387
                always @(posedge clk)
388
                     if (we)
                           389
390
391
392
                vmicro16_bram #
                                      (MEM_WIDTH),
                     .MEM_WIDTH
.MEM_DEPTH
393
                                      (MEM_DEPTH),
394
395
                     .USE_INITS
                                      (0),
                                      ("BRAMexinst")
396
                      .NAME
397
                ) bram_apb (
                                      (clk).
398
                     .clk
399
                                      (reset),
                     .reset
400
                                      (mem_addr),
(S_PWDATA),
401
                     .{\tt mem\_addr}
402
                     .mem_in
                                      (we && swex_success),
403
                     .mem we
404
                     .mem_out
                                      (mem_out)
405
               );
406
          endmodule
407
408
           // Simple APB memory-mapped register set
          module vmicro16_regs_apb # (
parameter BUS_WIDTH
parameter DATA_WIDTH
parameter CELL_DEPTH
409
                                                      = 16,
410
                                                      = 16,
411
                                                     = 8,
412
               parameter PARAM_DEFAULTS_RO = 0,
parameter PARAM_DEFAULTS_R1 = 0
413
414
415
416
                input clk,
                input reset,
// APB Slave to master interface
417
418
                input ['clog2(CELL_DEPTH)-1:0] S_PADDR,
419
                                                            S_PWRITÉ,
S PSELx.
420
                input
421
                input.
                                                            S_PENABLE,
422
                input
                input [DATA_WIDTH-1:0]
423
                                                            S_PWDATA,
424
                output [DATA_WIDTH-1:0]
                                                            S_PRDATA,
425
                                                            S_PREADY
426
                output
427
          );
428
                wire [DATA_WIDTH-1:0] rd1;
429
                assign S_PRDATA = (S_PSELx & S_PENABLE) ? rd1 : 16'h0000;
assign S_PREADY = (S_PSELx & S_PENABLE) ? 1'b1 : 1'b0;
assign reg_we = (S_PSELx & S_PENABLE & S_PWRITE);
430
431
432
433
                always Q(*)
434
435
                     if (reg_we)
                           $display($time, "\t\tREGS_APB[%h] <= %h",
S_PADDR, S_PWDATA);</pre>
436
437
438
                always @(*)
439
                      rassert(reg_we == (S_PSELx & S_PENABLE & S_PWRITE))
440
441
442
                vmicro16_regs # (
                     .CELL_DEPTH
                                                 (CELL_DEPTH),
443
                     .CELL_WIDTH (DATA_WIDTH),
.PARAM_DEFAULTS_RO (PARAM_DEFAULTS_RO),
.PARAM_DEFAULTS_R1 (PARAM_DEFAULTS_R1)
444
445
446
447
                ) regs_apb (
                     .clk
                                 (clk),
449
                      .reset (reset),
450
                     // port 1
                     .rs1
                                 (S_PADDR),
451
452
                     .rd1
                                 (rd1),
                                (reg_we),
(S_PADDR),
(S_PWDATA)
453
                      .we
454
                      .ws1
455
                     .wd
                     // port 2 unconnected
//.rs2 (),
456
457
458
                     //.rd2
                                   ()
          );
endmodule
459
460
461
          // Simple GPIO write only peripheral
module vmicro16_gpio_apb # (
462
463
```

```
parameter BUS_WIDTH = 16,
parameter DATA_WIDTH = 16,
parameter PORTS = 8,
465
466
467
                   parameter NAME
468
                    input clk.
469
                    input reset,
470
471
                    // APB Slave to master interface
                                                                          S_PADDR, // not used (optimised out)
\begin{array}{c} 472 \\ 473 \end{array}
                    input [0:0]
                                                                          S_PWRITE,
                    input
474
                    input
                                                                          S_PSELx,
                                                                          S_PENABLE,
475
                    input
                              [DATA WIDTH-1:0]
                                                                          S PWDATA.
476
                   input
477
                                                                          S_PRDATA,
                    output [DATA_WIDTH-1:0]
478
                                                                          S_PREADY,
479
                    output
                   output reg [PORTS-1:0]
480
                                                                          gpio
            );
481
                   assign S_PRDATA = (S_PSELx & S_PENABLE) ? gpio : 16'h0000;
assign S_PREADY = (S_PSELx & S_PENABLE) ? 1'b1 : 1'b0;
assign ports_we = (S_PSELx & S_PENABLE & S_PWRITE);
483
484
485
                   always @(posedge clk)
    if (reset)
        gpio <= 0;
else if (ports_we) begin
        $display($time, "'t\%s <= \%h", NAME, S_PWDATA[PORTS-1:0]);
        gpio <= S_PWDATA[PORTS-1:0];
end</pre>
486
487
488
489
490
491
                          end
492
            endmodule
493
```

## C.5 vmicro16.v

Vmicro16 CPU core module.

```
1
             // This file contains multiple modules
           // Verilator likes 1 file for each module
/* verilator lint_off DECLFILENAME */
/* verilator lint_off UNUSED */
/* verilator lint_off BLKSEQ */
/* verilator lint_off WIDTH */
 2
 3
 4
 5
 6
7
           // Include Vmicro16 ISA containing definitions for the bits `include "vmicro16_isa.v" \,
10
           `include "clog2.v"
`include "formal.v"
11
12
13
14
15
16
           // This module aims to be a SYNCHRONOUS, WRITE_FIRST BLOCK RAM
           //
//
                   https://www.xilinx.com/support/documentation/user\_guides/ug473\_7Series\_Memory\_Resources.pdf \\ https://www.xilinx.com/support/documentation/user\_guides/ug383.pdf
17
18
                   https://www.xilinx.com/support/documentation/sw\_manuals/xilinx2016\_4/ug901-vivado-synthesis.pdf
19
           module vmicro16_bram # (
20
                 parameter MEM_WIDTH
parameter MEM_DEPTH
                                                        = 16,
21
22
                                                      = 64
= 0,
                  parameter CORE_ID
23
                                                         = 0,
                 parameter USE_INITS = 0,
parameter PARAM_DEFAULTS_RO = 0,
parameter PARAM_DEFAULTS_R1 = 0,
24
25
26
                 parameter PARAM_DEFAULTS_R2 = 0,
parameter PARAM_DEFAULTS_R3 = 0,
parameter NAME = "BRAM"
27
28
29
30
31
                  input clk,
32
                  input reset,
33
                                    [`clog2(MEM_DEPTH)-1:0] mem_addr,
[MEM_WIDTH-1:0] mem_in,
34
                                                                           mem_in,
35
                  input
36
                  input
                                                                            mem_we,
                  output reg [MEM_WIDTH-1:0]
37
                                                                           mem_out
38
           );
                 // memory vector
(* ram_style = "block" *)
reg [MEM_WIDTH-1:0] mem [0:MEM_DEPTH-1];
39
40
41
42
43
                  // not synthesizable
                 integer i;
initial begin
   for (i = 0; i < MEM_DEPTH; i = i + 1) mem[i] = 0;
   mem[0] = PARAM_DEFAULTS_RO;</pre>
44
45
46
```

```
mem[1] = PARAM_DEFAULTS_R1;
mem[2] = PARAM_DEFAULTS_R2;
  49
                                                            mem[3] = PARAM_DEFAULTS_R3;
  50
  51
  52
                                                              if (USE_INITS) begin
                                                                            //`define TEST_SW
`ifdef TEST_SW
  53
  54
  55
                                                                             $readmemh("E:\\Projects\\uni\\vmicro16\\sw\\verilog_memh.txt", mem);
  56
  57
  58
                                                                              ifdef TEST_ASM
  59
                                                                             $readmemh("E:\\Projects\\uni\\vmicro16\\sw\\asm.s.hex", mem);
  60
                                                                                 endif
  61
  62
                                                                            //`define TEST_COND
  ifdef TEST_COND
mem[0] = {`VMICR016_OP_MOVI,
mem[0] = {`VMICR016_OP_MOVI,
  63
  64
                                                                                                                                                                                                           3'h7, 8'hCO}; // lock
3'h7, 8'hCO}; // lock
  65
  66
  67
  68
                                                                            //`define TEST_CMP
  `ifdef TEST_CMP
mem[0] = {`VMICR016_OP_MOVI,
mem[1] = {`VMICR016_OP_MOVI,
mem[2] = {`VMICR016_OP_CMP,
  69
  70
  71
                                                                                                                                                                                                           3'h0, 8'h0A};
                                                                                                                                                                                                           3'h1, 8'h0B};
3'h1, 3'h0, 5'h1};
  72
73
  74
  75
                                                                             //`define TEST_LWEX
`ifdef TEST_LWEX
  76
  77
                                                                            mem[0] = {`VMICR016_OP_MOVI,
mem[1] = {`VMICR016_OP_SW,
mem[2] = {`VMICR016_OP_LW,
mem[3] = {`VMICR016_OP_LWEX,
mem[4] = {`VMICR016_OP_SWEX,
                                                                                                                                                                                                           3'h0, 8'hC5};
3'h0, 3'h0, 5'h1};
3'h2, 3'h0, 5'h1};
3'h2, 3'h0, 5'h1};
3'h3, 3'h0, 5'h1};
  78
  79
  80
  81
  82
  83
                                                                          //`define TEST_MULTICORE

`ifdef TEST_MULTICORE

mem[0] = {`VMICR016_OP_MOVI,
mem[1] = {`VMICR016_OP_SW,
mem[2] = {`VMICR016_OP_MOVI,
mem[3] = {`VMICR016_OP_MOVI,
mem[4] = {`VMICR016_OP_MOVI,
mem[5] = {`VMICR016_OP_MOVI,
mem[6] = {`VMICR016_OP_MOVI,
mem[7] = {`VMICR016_OP_MOVI,
mem[8] = {`VMICR016_OP_MOVI,
mem[9] = {`VMICR016_OP_SW,
`endif
  84
  85
  86
                                                                                                                                                                                                            3'h0, 8'h90};
  87
                                                                                                                                                                                                           3'h1, 8'h33};
3'h1, 3'h0, 5'h0};
  88
                                                                                                                                                                                                            3'h1, 3'h0, 3'h0, 3'h0, 8'h80};
  89
  90
  91
                                                                                                                                                                                                            3'h1, 8'h33};
  92
                                                                                                                                                                                                           3'h1, 8'h33};
3'h1, 8'h33};
  93
  94
                                                                                                                                                                                                            3'h0, 8'h91};
  95
  96
                                                                                                                                                                                                           3'h2, 3'h0, 5'h0};
  97
                                                                                 endif
  98
                                                                           //`define TEST_BR

`ifdef TEST_BR

mem[0] = {`VMICR016_OP_MOVI, 3'h0, 8'h0};

mem[1] = {`VMICR016_OP_MOVI, 3'h3, 8'h3};

mem[2] = {`VMICR016_OP_MOVI, 3'h1, 8'h2};

mem[3] = {`VMICR016_OP_ARITH_U, 3'h0, 3'h1, 5'b11111};

mem[4] = {`VMICR016_OP_BR, 3'h3, `VMICR016_OP_BR_U};

mem[5] = {`VMICR016_OP_MOVI, 3'h0, 8'hFF};
100
101
102
103
104
105
106
107
108
                                                                             //`define ALL_TEST
`ifdef ALL_TEST
109
110
                                                                             // Standard all test
// REGSO
111
112
                                                                            mem[0] = {\text{`VMICRO16_OP_MOVI,} mem[1] = {\text{`VMICRO16_OP_SW,} mem[2] = {\text{`VMICRO16_OP_SW,}}
                                                                                                                                                                                                           3'h0, 8'h81};
3'h1, 3'h0, 5'h0}; // MMU[Ox81] = 6
3'h2, 3'h0, 5'h1}; // MMU[Ox82] = 6
113
114
115
                                                                            116
117
                                                                                                                                                                                                            3'h0, 8'h90};
                                                                                                                                                                                                           3'h1, 8'hD};
3'h1, 3'h0, 5'h0};
3'h2, 3'h0, 5'h0};
118
119
120
                                                                              // TIMO
121
                                                                            mem[7] = {`VMICRO16_OP_MOVI,
mem[8] = {`VMICRO16_OP_LW,
                                                                                                                                                                                                           3'h0, 8'h07};
3'h3, 3'h0, 5'h03};
122
123
                                                                              // UARTO
124
                                                                           Mem[9] = {`VMICRO16_OP_MOVI,
    mem[10] = {`VMICRO16_OP_MOVI,
    mem[11] = {`VMICRO16_OP_SW,
    mem[12] = {`VMICRO16_OP_SW,
    mem[13] = {`VMICRO16_OP_SW,
    mem[14] = {`VMICRO16_OP_MOVI,
    mem[15] = {`VMICRO16_OP_MOVI,
    mem[16] = {`VMICRO16_OP_MOVI,
    mem[17] = {`VMICRO16_OP_SW,
    mem[17] = {`VMICRO16_OP_SW,
    mem[18] = {`VMICRO16_OP_MOVI,
    mem[18] = {`
                                                                                                                                                                                                                                                                                           // UARTO
                                                                                                                                                                                                                3'h0, 8'hA0};
125
                                                                                                                                                                                                               3'h1, 8'h41}; // as
3'h1, 3'h0, 5'h0};
3'h1, 8'h42}; // ascii B
3'h1, 3'h0, 5'h0};
                                                                                                                                                                                                                                                                                          // ascii A
126
127
128
129
                                                                                                                                                                                                               3'h1, 8'h43}; // ascii C
3'h1, 3'h0, 5'h0};
3'h1, 8'h44}; // ascii D
3'h1, 3'h0, 5'h0};
130
131
132
133
                                                                            mem[18] = {`VMICRO16_OP_MOVI,
mem[19] = {`VMICRO16_OP_SW,
mem[20] = {`VMICRO16_OP_MOVI,
                                                                                                                                                                                                               3'h1, 8'h45}; // ascii D
3'h1, 3'h0, 5'h0};
3'h1, 8'h46}; // ascii E
134
135
136
```

```
137
                             mem[21] = {`VMICRO16_OP_SW},
                                                                              3'h1, 3'h0, 5'h0};
                             // BRAMO
138
                             // BRANO
mem[22] = {`VMICRO16_OP_MOVI,
mem[23] = {`VMICRO16_OP_MOVI,
mem[24] = {`VMICRO16_OP_SW,
mem[25] = {`VMICRO16_OP_LW,
                                                                               3'h0, 8'hC0};
139
                                                                               3'h1, 8'hA};
3'h1, 3'h0, 5'h5};
3'h2, 3'h0, 5'h5};
140
142
                             143
                                                                               3'h0, 8'h91};
144
                                                                               3'h1, 8'h12};
3'h1, 3'h0, 5'h0};
3'h2, 3'h0, 5'h0};
145
146
147
                             // GPI02
148
                             mem[30] = {\text{VMICRO16_OP_MOVI,}}
mem[31] = {\text{VMICRO16_OP_MOVI,}}
mem[32] = {\text{VMICRO16_OP_SW,}}
149
                                                                               3'h0, 8'h92};
                                                                               3'h1, 8'h56};
3'h1, 3'h0, 5'h0};
150
151
152
                               endif
153
                             //`define TEST_BRAM
`ifdef TEST_BRAM
'/' 2 core BRAMO test
mem[0] = {`VMICR016_OP_MOVI,
mem[1] = {`VMICR016_OP_SW,
mem[2] = {`VMICR016_OP_LW,
mem[3] = {`VMICR016_OP_LW,
154
155
156
157
                                                                              3'h0, 8'hC0};
                                                                             3'h1, 8'hA};
3'h1, 3'h0, 5'h5};
3'h2, 3'h0, 5'h5};
158
159
160
161
                               endif
                       end
                 end
163
164
                 always @(posedge clk) begin
165
                        // synchronous WRITE_FIRST (page 13)
166
167
                       if (mem_we) begin
                             168
169
170
171
                       end else
                            mem_out <= mem[mem_addr];</pre>
172
173
174
                 // TODO: Reset impl = every clock while reset is asserted, clear each cell // one at a time, mem[i++] <= 0
175
176
           endmodule
177
178
179
           module vmicro16_core_mmu # (
180
                                                = 16,
= 64,
                parameter MEM_WIDTH parameter MEM_DEPTH
181
182
183
                parameter CORE_ID = 3'h0,
parameter CORE_ID_BITS = `clog2(`CORES)
184
185
           ) (
186
                 input clk,
187
188
                 input reset,
189
                 input req,
output busy,
190
191
192
                 // From core
193
                                 [MEM_WIDTH-1:0] mmu_addr,
                 input
194
                 input
                                  [MEM_WIDTH-1:0] mmu_in,
195
196
                 input
                                                           mmu_we,
197
                 input
                                                           mmu_lwex,
198
                 input
                                                           mmu_swex,
                 output reg [MEM_WIDTH-1:0] mmu_out,
199
200
                 // interrupts
output reg [`DATA_WIDTH*`DEF_NUM_INT-1:0] ints_vector,
output reg [`DEF_NUM_INT-1:0] ints_mask,
201
202
203
204
                 // TO APB interconnect
output reg [`APB_WIDTH-1:0] M_PADDR,
output reg M_PWRITE
205
206
207
                                                             M_PSELx,
M_PENABLE,
208
                 output reg
209
                 output reg
output reg [MEM_WIDTH-1:0]
210
                                                            M_PWDATA,
                 // from interconnect
211
                              [MEM_WIDTH-1:0]
212
                 input
                                                            M PRDATA
213
                 input
                                                             M PREADY
           );
214
                 localparam MMU_STATE_T1 = 0;
localparam MMU_STATE_T2 = 1;
localparam MMU_STATE_T3 = 2;
215
216
217
218
                 reg [1:0] mmu_state
                                                        = MMU_STATE_T1;
219
                 reg [MEM_WIDTH-1:0] per_out = 0;
wire [MEM_WIDTH-1:0] tim0_out;
220
221
                 assign busy = req || (mmu_state == MMU_STATE_T2);
223
224
                 // more luts than below but easier
225
                 //wire tim0_en = (mmu_addr >= `DEF_MMU_TIM0_S)
226
```

```
// && (mmu_addr <= `DEF_MMU_TIMO_E);
//wire sreg_en = (mmu_addr >= `DEF_MMU_SREG_S)
// && (mmu_addr <= `DEF_MMU_SREG_E);
//wire intv_en = (mmu_addr >= `DEF_MMU_INTSV_S)
227
228
229
230
                  231
232
233
234
                 wire tim0_en = ~mmu_addr[12] && ~mmu_addr[9] && ~mmu_addr[7]; wire sreg_en = mmu_addr[7] && ~mmu_addr[4] && ~mmu_addr[5]; wire intv_en = mmu_addr[8] && ~mmu_addr[3]; wire intm_en = mmu_addr[8] && mmu_addr[3];
235
236
237
238
239
240
                                       = !(|{tim0_en, sreg_en, intv_en, intm_en});
                                      = (tim0_en && mmu_we);
= (intv_en && mmu_we);
= (intm_en && mmu_we);
241
                 wire tim0_we
                 wire intv_we
242
243
                 wire intm_we
244
                 // Special register selects
localparam SPECIAL_REGS = 8;
wire [MEM_WIDTH-1:0] sr_val;
245
246
247
248
                  // Interrupt vector and mask
249
                 initial ints_vector = 0;
250
251
                 initial ints_mask
252
                 wire [2:0] intv_addr = mmu_addr[`clog2(`DEF_NUM_INT)-1:0];
                 always @(posedge clk)
253
                       if (intv_we)
254
                              ints_vector[intv_addr*`DATA_WIDTH +: `DATA_WIDTH] <= mmu_in;</pre>
255
256
                 always @(posedge clk)
257
                       if (intm_we)
258
259
                             ints_mask <= mmu_in;</pre>
260
261
                 always @(ints_vector)
262
                        $display($time,
264
                                    "\tC%d\t\tints_vector W: | %h %",
                                     CORE ID.
265
                             CURE_ID,
ints_vector[0*`DATA_WIDTH +: `DATA_WIDTH],
ints_vector[1*`DATA_WIDTH +: `DATA_WIDTH],
ints_vector[2*`DATA_WIDTH +: `DATA_WIDTH],
ints_vector[3*`DATA_WIDTH +: `DATA_WIDTH],
ints_vector[4*`DATA_WIDTH +: `DATA_WIDTH],
266
267
268
269
270
                              ints_vector[5**DATA_WIDTH +: DATA_WIDTH],
ints_vector[6**DATA_WIDTH +: DATA_WIDTH],
ints_vector[7**DATA_WIDTH +: DATA_WIDTH]
271
272
273
274
                             );
275
276
                 always @(intm_we)
                       $display($time, "\tC%d\t\tintm_we W: %b", CORE_ID, ints_mask);
277
278
                 // Output port always @(*)
279
280
281
                        if
                                   (tim0_en) mmu_out = tim0_out;
                       else if (sreg_en) mmu_out = sr_val;
else if (intv_en) mmu_out = ints_vector[mmu_addr[2:0]*`DATA_WIDTH
282
283
                                                                                           +: `DATA_WIDTH];
284
285
                       else if (intm_en) mmu_out = ints_mask;
286
                       else
                                                  mmu_out = per_out;
287
                  // APB master to slave interface
288
289
                 always @(posedge clk)
290
                       if (reset) begin
                             mmu_state <= MMU_STATE_T1;
M_PENABLE <= 0;
291
292
                             M_PADDR <= 0;
M_PWDATA <= 0;
293
294
                              M_PSELx
295
                             M_PWRITE <= 0;
296
297
                       end
298
                       else
299
                              casex (mmu_state)
                                   MMU_STATE_T1: begin
300
                                         if (req && apb_en) begin
M_PADDR <= {mmu_lwe
301
302
                                                              <= {mmu_lwex,
                                                                     mmu_swex,
CORE_ID[CORE_ID_BITS-1:0],
304
                                                                     mmu_addr[MEM_WIDTH-1:0]};
305
306
307
                                                M_PWDATA <= mmu_in;</pre>
                                               M_PSELx <= 1;
M_PWRITE <= mmu_we;
308
309
310
311
                                                mmu_state <= MMU_STATE_T2;</pre>
                                         end
312
                                   end
313
314
                                    `ifdef FIX_T3
315
                                         MMU_STATE_T2: begin
316
```

```
317
                                                M_PENABLE <= 1;
318
                                                if (M_PREADY == 1'b1) begin
319
                                                     mmu_state <= MMU_STATE_T3;</pre>
320
321
                                                end
                                          end
322
323
324
                                          MMU_STATE_T3: begin
                                                // Slave has output a ready signal (finished)
M_PENABLE <= 0;
325
326
                                                M_PADDR <= 0;
M_PWDATA <= 0;
327
328
                                                M_PSELx
                                                              <= 0;
329
                                                M_PWRITE <= 0;
                                                // Clock the peripheral output into a reg,
// to output on the next clock cycle
per_out <= M_PRDATA;
331
332
333
334
                                                mmu_state <= MMU_STATE_T1;</pre>
335
                                          end
336
337
                                   `else
                                         // No FIX_T3
MMU_STATE_T2: begin
if (M_PREADY == 1'b1) begin
'* PENARIF <= 0:</pre>
338
339
340
                                                      M_PENABLE <= 0;
M_PADDR <= 0;
M_PWDATA <= 0;
341
342
343
                                                     M_PWBAIA <= 0;
M_PSELx <= 0;
M_PWRITE <= 0;
// Clock the peripheral output into a reg,
// to output on the next clock cycle
per_out <= M_PRDATA;
344
345
346
347
348
349
                                                      mmu_state <= MMU_STATE_T1;</pre>
350
                                                end else begin
M_PENABLE <= 1;
351
352
                                                end
353
                                          end
354
                                    `endif
356
                              endcase
357
358
                 (* ram_style = "block" *)
                 359
360
361
362
                        .PARAM_DEFAULTS_RO (CORE_ID),
363
                        .PARAM_DEFAULTS_R1 (`CORES),
.PARAM_DEFAULTS_R2 (`APB_BRAMO_CELLS),
.PARAM_DEFAULTS_R3 (`SLAVES),
364
365
366
                                          ("ram_sr")
                        .NAME
367
                 ) ram_sr (
369
                       .clk
                                          (clk).
                                          (reset)
370
                        .reset
                        .mem_addr
                                          (mmu_addr[`clog2(SPECIAL_REGS)-1:0]),
371
                                          (),
372
                        .mem_in
                        .mem_we
373
374
                        .mem_out
                                          (sr_val)
                 ):
375
376
                 // Each M core has a TIMO scratch memory
(* ram_style = "block" *)
377
378
                 vmicro16_bram # (
.MEM_WIDTH (
379
                                          (MEM_WIDTH),
380
381
                        .MEM_DEPTH
                                          (MEM_DEPTH),
382
                        .USE_INITS
                                         (0),
("TIMO")
383
                        . NAME
                 ) TIMO (
384
                       .clk
                                          (clk),
385
                        .reset
                                          (reset)
386
                                          (mmu_addr[7:0]),
387
                        .{\tt mem\_addr}
388
                        .mem_in
                                          (mmu_in),
                                          (timO_we)
389
                        .mem we
                                          (tim0_out)
390
                        .mem_out
391
                 );
392
           endmodule
393
394
395
           module vmicro16_regs # (
    parameter CELL_WIDTH
    parameter CELL_DEPTH
396
397
                                                            = 16,
                                                           = 8,
= `clog2(CELL_DEPTH),
398
                 parameter CELL_SEL_BITS
parameter CELL_DEFAULTS
parameter DEBUG_NAME
399
                                                           = 0,
400
401
                 parameter CORE_ID = 0,
parameter PARAM_DEFAULTS_R0 = 16'h0000,
parameter PARAM_DEFAULTS_R1 = 16'h0000
402
403
404
           ) (
405
406
                 input clk,
```

```
input reset,
// Dual port register reads
407
408
                                 [CELL_SEL_BITS-1:0] rs1, // port 1
[CELL_WIDTH-1 :0] rd1,
409
                 input
410
                 output
                 //input [CELL_SEL_BITS-1:U]
//output [CELL_WIDTH-1 :0]
// EX/WB final stage write back
                                [CELL_SEL_BITS-1:0] rs2, // port 2
[CELL_WIDTH-1 :0] rd2,
411
412
413
414
                 input
                                                                  we,
                 input [CELL_SEL_BITS-1:0]
415
                 input [CELL_WIDTH-1:0]
416
                                                                  wd
           );
417
                 (* ram_style = "distributed" *)
418
                 reg [CELL_WIDTH-1:0] regs [0:CELL_DEPTH-1] /*verilator public_flat*/;
419
420
                 // Initialise registers with default values
// Really only used for special registers used by the soc
// TODO: How to do this on reset?
421
422
423
424
                 integer i;
425
                 initial
                      if (CELL_DEFAULTS)
426
427
                             $readmemh(CELL_DEFAULTS, regs);
                       else begin
  for(i = 0; i < CELL_DEPTH; i = i + 1)
    regs[i] = 0;
  regs[0] = PARAM_DEFAULTS_R0;
  regs[1] = PARAM_DEFAULTS_R1;
  regs[1] = PARAM_DEFAULTS_R1;</pre>
428
429
430
431
432
433
434
                 `ifdef ICARUS
435
436
                       always @(regs)
                             437
438
439
440
                 `endif
441
442
443
                 always @(posedge clk)
                      if (reset) begin
   for(i = 0; i < CELL_DEPTH; i = i + 1)
        regs[i] <= 0;
   regs[0] <= PARAM_DEFAULTS_R0;
   regs[1] <= PARAM_DEFAULTS_R1;
</pre>
444
445
446
447
448
                       end
449
                       else if (we) begin
450
                             $display($time, "\tC%02h: REGS #%s: Writing %h to reg[%d]",
451
                                    CORE_ID, DEBUG_NAME, wd, ws1);
452
453
                             // Perform the write
454
                             regs[ws1] <= wd;
455
                       end
457
458
                 // sync writes, async reads
                 assign rd1 = regs[rs1];
//assign rd2 = regs[rs2];
459
460
461
           endmodule
462
           module vmicro16_dec # (
463
                 parameter INSTR_WIDTH = 16
parameter INSTR_OP_WIDTH = 5,
parameter INSTR_RS_WIDTH = 3,
464
                                                      = 16,
465
466
                 parameter ALU_OP_WIDTH
467
468
                 //input clk, // not used yet (all combinational) //input reset, // not used yet (all combinational) \label{eq:combinational}
469
470
471
472
                 input [INSTR_WIDTH-1:0]
                                                           instr,
473
                 output [INSTR_OP_WIDTH-1:0] opcode,
output [INSTR_RS_WIDTH-1:0] rd,
474
475
                 output [INSTR_RS_WIDTH-1:0] ra,
output [3:0] imm
output [7:0] imm
output [11:0] imm
476
477
                                                            imm4
                                                            imm8.
478
                                                            imm12,
479
                 output [4:0]
480
                                                           simm5.
481
                 // This can be freely increased without affecting the isa output reg [ALU_OP_WIDTH-1:0] alu_op,
482
483
484
485
                 output reg has_imm4,
                 output reg has_imm8, output reg has_imm12,
486
487
488
                 output reg has_we,
                 output reg has_br, output reg has_mem,
489
490
491
                 output reg has_mem_we,
492
                 output reg has_cmp,
493
494
                 output halt,
495
                 output intr
496
```

```
497
                    output reg has_lwex,
498
                    output reg has_swex
499
                       TODO: Use to identify bad instruction and
500
                    // raise exceptions
//,output is_bad
501
502
            );
503
504
                    assign opcode = instr[15:11];
                   assign rd = instr[10:8];
assign ra = instr[7:5];
assign imm4 = instr[3:0];
assign imm8 = instr[7:0];
assign imm12 = instr[11:0];
505
506
507
508
                   assign simm5 = instr[4:0];
510
511
                    // exme_op
512
                   513
514
515
516
                                                                               alu_op = `VMICRO16_ALU_NOP;
alu_op = `VMICRO16_ALU_NOP; endcase
517
                                 default:
518
519
                                                                               alu_op = `VMICRO16_ALU_LW;
alu_op = `VMICRO16_ALU_SW;
alu_op = `VMICRO16_ALU_LW;
520
                           `VMICRO16_OP_LW:
                          `VMICRO16_OP_SW:

`VMICRO16_OP_LWEX:

`VMICRO16_OP_SWEX:
521
522
                                                                               alu_op = `VMICRO16_ALU_SW;
523
524
                                                                               alu_op = `VMICRO16_ALU_MOV;
alu_op = `VMICRO16_ALU_MOVI;
                           `VMICRO16_OP_MOV:
`VMICRO16_OP_MOVI:
525
526
527
                                                                               alu_op = `VMICRO16_ALU_BR;
alu_op = `VMICRO16_ALU_MULT;
                           `VMICRO16_OP_BR:
`VMICRO16_OP_MULT:
528
529
530
                                                                               alu_op = `VMICRO16_ALU_CMP;
alu_op = `VMICRO16_ALU_SETC;
                           `VMICRO16_OP_CMP:
`VMICRO16_OP_SETC:
531
532
533
                           `VMICRO16_OP_BIT: cas

`VMICRO16_OP_BIT_OR:

`VMICRO16_OP_BIT_XOR:
534
                                                                 casez (simm5)
                                                                               alu_op = `VMICRO16_ALU_BIT_OR;
alu_op = `VMICRO16_ALU_BIT_XOR;
535
536
                                                                              alu_op = `VMICRO16_ALU_BIT_AND;
alu_op = `VMICRO16_ALU_BIT_NOT;
alu_op = `VMICRO16_ALU_BIT_LSHFT;
alu_op = `VMICRO16_ALU_BIT_RSHFT;
alu_op = `VMICRO16_ALU_BAD; endcase
                                  `VMICRO16_OP_BIT_AND:

`VMICRO16_OP_BIT_NOT:

`VMICRO16_OP_BIT_LSHFT:
537
538
539
                                  `VMICRO16_OP_BIT_RSHFT:
540
541
                                  default:
542
                                 CRO16_OP_ARITH_U: casez (simm5)

`VMICRO16_OP_ARITH_UADD: alu_op = `VMICRO16_ALU_ARITH_UADD;

`VMICRO16_OP_ARITH_USUB: alu_op = `VMICRO16_ALU_ARITH_USUB;

`VMICRO16_OP_ARITH_UADDI: alu_op = `VMICRO16_ALU_ARITH_UADDI;
default: alu_op = `VMICRO16_ALU_BAD; endcase
                           `VMICRO16_OP_ARITH_U:
543
544
545
546
547
548
                           `VMICRO16 OP ARITH S:
549
                                                                       casez (simm5)
                                   TWMICRO16_OP_ARITH_SSUB: alu_op = VMICRO16_ALU_ARITH_SSUB;
VMICRO16_OP_ARITH_SSUB: alu_op = VMICRO16_ALU_ARITH_SSUB;
VMICRO16_OP_ARITH_SSUBI: alu_op = VMICRO16_ALU_ARITH_SSUBI;
550
551
552
                                                                               alu_op = `VMICRO16_ALU_BAD; endcase
553
                                  default:
554
555
                           default: begin
                                                                               alu_op = `VMICRO16_ALU_NOP;
556
                                 $display($time, "\tDEC: unknown opcode: %h ... NOPPING", opcode);
557
                           end
559
                    endcase
560
561
                    // Special opcodes
                   // assign nop == ((opcode == `VMICRO16_OP_SPCL) & (~instr[0]));
assign halt = ((opcode == `VMICRO16_OP_SPCL) & instr[0]);
assign intr = ((opcode == `VMICRO16_OP_SPCL) & instr[1]);
562
563
564
565
                   566
567
568
569
                          VMICRO16_OP_LW,
VMICRO16_OP_MOV,
VMICRO16_OP_MOVI,
570
571
572
                          VMICRO16_OP_MOVI_L,

VMICRO16_OP_ARITH_U,

VMICRO16_OP_ARITH_S,

VMICRO16_OP_SETC,
573
574
575
576
577
                           `VMICRO16_OP_BIT,
                          VMICRO16_OP_MULT:
                                                                    has_we = 1'b1;
578
                                                                    has_we = 1'b0;
579
                           default:
580
                    endcase
581
                    // Contains 4-bit immediate always @(*)
582
583
                           584
585
586
```

```
587
                    else
                         has_imm4 = 1'b0;
588
589
               590
591
592
                                                    has_imm8 = 1'b1;
has_imm8 = 1'b0;
593
594
                    default:
               endcase
595
               //// Contains 12-bit immediate
//always @(*) case (opcode)
// `VMICRO16_OP_MOVI_L: h
// default:
596
597
598
                                                       has_imm12 = 1'b1;
599
                       default:
                                                      has\_imm12 = 1'b0;
600
               //endcase
601
               // Will branch the pc
always @(*) case (opcode)
    'VMICR016_OP_BR: has_br = 1'b1;
    has_br = 1'b0;
602
603
604
605
606
607
608
               // Requires external memory always @(*) case (opcode) VMICRO16_OP_LW, VMICRO16_OP_SW,
610
611
612
613
                     `VMICRO16_OP_LWÉX,
                    `VMICRO16_OP_SWEX: has_mem = 1'b1;
614
                                               has_mem = 1'b0;
                    default:
615
               endcase
617
               618
619
620
621
622
623
624
               625
626
627
628
               endcase
629
630
               631
632
633
                                                has_lwex = 1'b0;
634
635
               endcase
               always @(*) case (opcode)

VMICR016_OP_SWEX: has_swex = 1'b1;

has_swex = 1'b0;
636
637
638
639
640
641
          endmodule
642
643
         module vmicro16_alu # (
parameter OP_WIDTH = 5,
parameter DATA_WIDTH = 16,
parameter CORE_ID = 0
644
646
647
648
               // input clk, // TODO: make clocked
649
650
                         LUP_WIDTH-1:0] op,
[DATA_WIDTH-1:0] a, // rs1/dst
[DATA_WIDTH-1:0] b, // rs2
[3:0] flags
651
               input
652
               input
653
654
               input
               output reg [DATA_WIDTH-1:0] c
655
656
               localparam TOP_BIT = (DATA_WIDTH-1);
657
               // 17-bit register
reg [DATA_WIDTH:0] cmp_tmp = 0; // = {carry, [15:0]}
658
659
660
               wire r_setc;
661
               always @(*) begin
662
                    case (op)
// branch/nop, output nothing
VMICRO16_ALU_NOP:
CV/ load/stare_addresses (uses
663
664
665
666
                                               c = {DATA_WIDTH{1'b0}};
667
                     // load/store addresses (use value in rd2)

VMICRO16_ALU_LW,
668
669
                     `VMICRO16_ALU_SW:
670
                     // bitwise operations
`VMICRO16_ALU_BIT_OR:
`VMICRO16_ALU_BIT_XOR:
671
                    672
673
674
675
676
```

```
677
                     `VMICRO16_ALU_BIT_RSHFT: c = a >> b;
678
679
                     `VMICRO16_ALU_MOV:
                                                          c = b;
                     VMICRO16_ALU_MOVI:

VMICRO16_ALU_MOVI_L:
                                                          c = b;
c = b;
680
681
682
                     683
684
                     // TODO: ALU should have simms as trep at `VMICRO16_ALU_ARITH_UADDI: c = a + b;
685
686
687
                     `ifdef DEF_ALU_HW_MULT
    `VMICRO16_ALU_MULT:    c = a * b;
688
689
690
                     endif
691
                    692
693
                      / TODO: ALU should have simm5 as input
694
                     VMICRO16_ALU_ARITH_SSUBI: c = $signed(a) - $signed(b);
695
696
                         ... 20 u-o in 17-bit register // Set zero, overflow, carry, signed bits in result cmp_tmp = a - b; c = 0;
                     `VMICRO16_ALU_CMP: begin
697
698
699
700
701
702
                         // N Negative condition code flag
// Z Zero condition code flag
// C Carry condition code flag
// V Overflow condition code flag
c[`VMICRO16_SFLAG_N] = cmp_tmp[TOP_BIT];
c[`VMICRO16_SFLAG_Z] = (cmp_tmp == 0);
c[`VMICRO16_SFLAG_C] = 0; //cmp_tmp[TOP_BIT+1]; // not used
703
704
705
706
707
708
709
710
                          // Overflow flag
// https://stackoverflow.com/questions/30957188/
// https://github.com/bendl/prco304/blob/master/prco_core/rtl/prco_alu.v#L50
711
712
713
                          case(cmp_tmp[TOP_BIT+1:TOP_BIT])
714
                               2'b01: c[`VMICRO16_SFLAG_V] = 1;
2'b10: c[`VMICRO16_SFLAG_V] = 1;
default: c[`VMICRO16_SFLAG_V] = 0;
715
716
717
718
719
                          display(time, ''\tC\%02h: ALU CMP: \h \h = \h = \h'', CORE_ID, a, b, cmp_tmp, c[3:0]);
720
721
                    end
722
723
                    `VMICRO16_ALU_SETC: c = { {15{1'b0}}, r_setc };
724
725
                       TODO: Parameterise
726
                     default: begin
                          $display($time, "\tALU: unknown op: %h", op);
727
728
729
                          С
                                    = 0:
                          cmp\_tmp = \tilde{0};
730
                    end
731
732
                               endcase
                               end
733
734
               branch setc_check (
                                    (flags)
735
                    .flags
736
                     .cond
                                     (b[7:0]).
737
                                     (r_setc)
                    .en
         );
endmodule
738
739
740
         // flags = 4 bit r_cmp_flags register
// cond = 8 bit VMICRO16_OP_BR_? value. See vmicro16_isa.v
module branch (
   input [3:0] flags,
   input [7:0] cond,
741
742
743
744
745
               output reg en
747
         );
748
               always @(*)
                    749
750
751
752
753
                          754
755
756
757
758
                                                    en = 0;
759
                          default:
760
761
                    endcase
         endmodule
762
763
764
765
         module vmicro16_core # (
              parameter DATA_WIDTH
                                                    = 16,
```

```
parameter MEM_INSTR_DEPTH = 64,
parameter MEM_SCRATCH_DEPTH = 64,
767
768
              parameter MEM_WIDTH
769
770
              parameter CORE_ID
                                                  = 3'h0
771
772
773
         ) (
              input
774
775
776
                               reset,
              input
              output [7:0] dbug,
777
778
779
              output
                               halt,
780
               // interrupt sources
              input ['DEF_NUM_INT-1:0] ints,
input ['DEF_NUM_INT*'DATA_WIDTH-1:0] ints_data,
output ['DEF_NUM_INT-1:0] ints_ack,
781
782
783
784
               // APB master to slave interface (apb_intercon)
output [`APB_WIDTH-1:0] w_PADDR,
output w_PWRITE,
785
              output [ APB_WIDTH-1:0]
786
787
              output
                                                  w_PSELx,
788
              output
789
              output
                                                  w_PENABLE,
                         [DATA_WIDTH-1:0]
[DATA_WIDTH-1:0]
790
              output
                                                  w_PWDATA,
791
                                                  w PRDATA
              input
792
                                                  w_PREADY
              input
793
         `ifndef DEF_CORE_HAS_INSTR_MEM
794
795
               , // APB master interface to slave instruction memory
                                                      w2_PADDR,
796
              output reg [ APB_WIDTH-1:0]
                                                      w2_PWRITE,
w2_PSELx,
797
              output reg
798
              output reg output reg
799
                                                      w2_PENABLE,
800
              output reg [DATA_WIDTH-1:0]
                                                      w2_PWDATA,
                                                      w2_PRDATA,
801
              input
                             [DATA_WIDTH-1:0]
802
              input
         `endif
803
804
         );
              localparam STATE_IF = 0;
805
              localparam STATE_R1 = 1;
806
              localparam STATE_R2 = 2;
807
              localparam STATE_ME = 3;
localparam STATE_WB = 4;
808
809
              localparam STATE_FE = 5;
810
              localparam STATE_IDLE = 6;
localparam STATE_HALT = 7;
811
812
              reg [2:0] r_state = STATE_IF;
813
814
                                                             = 16'h0000;
                     [DATA_WIDTH-1:0] r_pc
[DATA_WIDTH-1:0] r_pc_saved
[DATA_WIDTH-1:0] r_instr
815
                                                            = 16'h0000;
816
              reg
                                                           = 16'h0000;
817
              reg
818
              wire [DATA_WIDTH-1:0] w_mem_instr_out;
819
              wire
                                           w halt:
820
              assign dbug = {7'h00, w_halt};
assign halt = w_halt;
821
822
823
              wire [4:0]
824
                                          r_instr_opcode;
              wire [4:0]
                                           r_instr_alu_op;
825
              wire [2:0]
                                           r_instr_rsd;
826
827
              wire [2:0]
                                           r_instr_rsa;
                     [DATA_WIDTH-1:0] r_instr_rdd = 0;
828
              reg
              reg [DATA_WIDTH-1:0] r_instr_rda = 0;
wire [3:0] r_instr_imm4;
829
830
              wire [7:0]
wire [4:0]
831
                                           r_instr_imm8;
                                          r_instr_simm5;
r_instr_has_imm4;
r_instr_has_imm8;
832
833
              wire
834
              wire
835
              wire
                                           r_instr_has_we;
836
              wire
                                           r instr has br:
                                           r_instr_has_cmp;
837
              wire
838
              wire
                                           r_instr_has_mem;
839
              wire
                                           r instr has mem we:
                                          r_instr_halt;
r_instr_has_lwex;
840
              wire
841
              wire
842
              wire
                                           r_instr_has_swex;
843
844
              wire [DATA_WIDTH-1:0] r_alu_out;
845
              wire [DATA_WIDTH-1:0] r_mem_scratch_addr = $signed(r_alu_out) + $signed(r_instr_simm5);
wire [DATA_WIDTH-1:0] r_mem_scratch_in = r_instr_rdd;
846
847
              wire [DATA_WIDTH-1:0] r_mem_scratch_out;
848
                                           r_mem_scratch_we = r_instr_has_mem_we && (r_state == STATE_ME);
r_mem_scratch_req = 0;
849
              wire
              reg
850
                                           r_mem_scratch_busy;
851
              wire
852
853
                     [2:0]
                                           r_reg_rs1 = 0;
              854
855
856
```

```
//wire [15:0] r_reg_rd2;
wire [DATA_WIDTH-1:0] r_reg_wd = (r_instr_has_mem) ? r_mem_scratch_out : r_alu_out;
857
858
                                                  r_reg_we = r_instr_has_we && (r_state == STATE_WB);
859
                 wire
860
861
                  // branching
                 wire
                                    w intr:
862
                                    w_branch_en;
863
                 wire
                 wire w_branching = r_instr_has_br && w_branch_en;
reg [3:0] r_cmp_flags = 4'h00; // N, Z, C, V
864
865
866
867
                 868
869
                // 2 cycle register fetch
always @(*) begin
    r_reg_rs1 = 0;
    if (r_state == STATE_R1)
        r_reg_rs1 = r_instr_rsd;
    else if (r_state == STATE_R2)
870
871
873
874
875
876
                             r_reg_rs1 = r_instr_rsa;
877
                       else
                             r_reg_rs1 = 3'h0;
878
879
880
                 reg regs_use_int = 0;
881
                 ifdef DEF_ENABLE_INT
wire ['DEF_NUM_INT*'DATA_WIDTH-1:0] ints_vector;
wire ['DEF_NUM_INT-1:0] ints_mask;
882
883
884
                                                                         ints_mask;
has_int = ints & ints_mask;
885
                 wire
886
                 reg int_pending = 0;
                 reg int_pending_ack = 0;
reg int_pending_ack = 0;
always @(posedge clk)
    if (int_pending_ack)
    // We've now branched to the isr
887
888
889
890
                       int_pending <= 0;
else if (has_int)
    // Notify fsm to switch to the ints_vector at the last stage</pre>
891
892
893
                             int_pending <= 1;</pre>
894
                       int_pending (- 1;
else if (w_intr)
    // Return to Interrupt instruction called,
    // so we've finished with the interrupt
    int_pending <= 0;</pre>
895
896
897
898
                 `endif
899
900
                 // Next program counter logic
reg [`DATA_WIDTH-1:0] next_pc = 0;
901
903
                 always @(posedge clk)
904
                       if (reset)
                       r_pc <= 0;
else if (r_state == STATE_WB) begin
ifdef DEF_ENABLE_INT
905
906
907
                             if (int_pending) begin $\text{display($time, "\tC%02h: Jumping to ISR: \h",}
908
                                    $display($time, CORE_ID,
909
910
911
                                          ints_vector[0 +: `DATA_WIDTH]);
                                    // TODO: check bounds
// Save state
912
913
                                    r_pc_saved
                                                            <= r_pc + 1;
914
                                                           <= 1;
915
                                    regs_use_int
                                    int_pending_ack <= 1;
// Jump to ISR</pre>
916
917
918
                                    r_pc
                                                             <= ints_vector[0 +: `DATA_WIDTH];</pre>
                              end else if (w_intr) begin

$display($time, "\tc%02h: Returning from ISR: %h",

CORE_ID, r_pc_saved);
919
920
921
922
                                    923
924
925
                                    int_pending_ack <= 0;
926
927
                              end else
928
                               endif
                             indIT
if (w_branching) begin
    $display($time, "\tC%02h: branching to %h", CORE_ID, r_instr_rdd);
    r_pc <= r_instr_rdd;</pre>
929
930
931
932
933
                                    `ifdef DEF_ENABLE_INT
                                        int_pending_ack <= 0;</pre>
934
                                    `endif
935
                             endif
end else if (r_pc < (MEM_INSTR_DEPTH-1)) begin
// normal increment
// pc <= pc + 1
r_pc <= r_pc + 1;</pre>
936
937
938
939
940
941
                                    `ifdef DEF_ENABLE_INT
942
                                         int_pending_ack <= 0;</pre>
                                     endif
943
944
                       end // end r_state == STATE_WB
else if (r_state == STATE_HALT) begin
945
946
```

```
947
                            `ifdef DEF_ENABLE_INT
                           // Only an interrupt can return from halt
// duplicate code form STATE_ME!
948
949
                           if (int_pending) begin
 950
                                 $display($time, "\tC%'
// TODO: check bounds
// Save state
                                                       "\tC%02h: Jumping to ISR: %h", CORE_ID, ints_vector[0 +: `DATA_WIDTH]);
 951
952
953
                                                       <= r_pc;// + 1; HALT = stay with same PC
954
                                 r_pc_saved
                                 regs_use_int
 955
                                                      <= 1;
956
                                 int_pending_ack <= 1;</pre>
                                 // Jump to ISR <= ints_vector[0 +: `DATA_WIDTH];
957
958
                           959
960
961
962
963
                                 int_pending_ack <= 0;</pre>
964
                           end
                             endif
965
 966
967
           `ifndef DEF_CORE_HAS_INSTR_MEM
968
                initial w2_PSELx = 0;
initial w2_PENABLE = 0;
 969
 970
                initial w2_PADDR
971
 972
973
974
                // cpu state machine
always @(posedge clk)
    if (reset) begin
975
 976
977
                           r_state
                                                    <= STATE_IF;
978
                           r_instr
                                                    <= 0;
979
                           r_mem_scratch_req <= 0;
 980
                           r_instr_rdd
                           r_{instr_rda}
981
                                                    <= 0;
                      end
982
                      else begin
 983
984
           985
 986
                                      r_instr <= w_mem_instr_out;
 987
988
                                      $display("");
989
                                      %display(%time, "\tc%02h: PC: %h", CORE_ID, r_pc); $display($time, "\tc%02h: INSTR: %h", CORE_ID, w_mem_instr_out);
990
 991
992
                                      r state <= STATE R1:
993
 994
           `else
995
                           // wait for global instruction rom to give us our instruction
if (r_state == STATE_IF) begin
   // wait for ready signal
   if (!w2_PREADY) begin
996
997
998
999
                                      w2_PSELx <= 1;
w2_PWRITE <= 0;
w2_PENABLE <= 1;
1000
1001
1002
                                      W2_PENABLE \- ;
W2_PWDATA <= 0;
w7_PADDR <= r_pc;
1003
1004
                                 end else begin
w2_PSELx <= 0;
w2_PWRITE <= 0;
w2_PENABLE <= 0;
1005
1006
1007
1008
1009
                                      w2_PWDATA <= 0;
1010
                                      r_instr <= w2_PRDATA;
1011
1012
1013
                                      $display("");
                                      $\display(\fine, "\tC%02h: PC: \h", CORE_ID, r_pc);
$\display(\fine, "\tC%02h: INSTR: \h", CORE_ID, w2_PRDATA);
1014
1015
1016
1017
                                      r_state <= STATE_R1;
                                 end
1018
1019
1020
           `endif
1021
                           else if (r_state == STATE_R1) begin
1022
                                 if (w_halt) begin
1023
                                      $display("");
$display("");
1024
1025
                                      #display($time, "\tC%02h: PC: %h HALT", CORE_ID, r_pc);
r_state <= STATE_HALT;</pre>
1026
1027
1028
                                 end else begin
                                      // primary operand
r_instr_rdd <= r_reg_rd1;
r_state <= STATE_R2;</pre>
1029
1030
1031
1032
                                 end
                           end
1033
                           else if (r_state == STATE_R2) begin
1034
                                 // Choose secondary operand (register or immediate)
if (r_instr_has_imm8) r_instr_rda <= r_instr_imm8;
1035
1036
```

```
1037
                              else if (r_instr_has_imm4) r_instr_rda <= r_reg_rd1 + r_instr_imm4;</pre>
                                                                 r_instr_rda <= r_reg_rd1;
1038
                              else
1039
1040
                              if (r_instr_has_mem) begin
1041
                                   r_state
// Pulse rea
                                                          <= STATE_ME;
1042
                                   r_mem_scratch_req <= 1;
1043
1044
                              end else
                                   r_state <= STATE_WB;
1045
1046
1047
                         else if (r_state == STATE_ME) begin
                             1048
1049
1050
1051
1052
                                   r_state <= STATE_WB;
                         end
1053
                         else if (r_state == STATE_WB) begin
1054
                              if (r_instr_has_cmp) begin $\text{display(\$time, "\tc%02h: CMP: \%h", CORE_ID, r_alu_out[3:0]);}
1055
1056
                                   r_cmp_flags <= r_alu_out[3:0];
1057
1058
1059
                              r_state <= STATE_FE;</pre>
1060
                         end
1061
                         else if (r_state == STATE_FE)
1062
                         r_state <= STATE_IF;
else if (r_state == STATE_HALT) begin
`ifdef DEF_ENABLE_INT
1063
1064
1065
                                   if (int_pending) begin
    r_state <= STATE_FE;</pre>
1066
1067
                                   end
1068
                              `endif
1069
1070
                         end
                    end
1071
1072
          `ifdef DEF_CORE_HAS_INSTR_MEM
1073
               // Instruction ROM
(* rom_style = "distributed" *)
vmicro16_bram # (
1074
1075
1076
                    .MEM_WIDTH
                                         (DATA_WIDTH),
1077
                    .MEM_DEPTH
                                         (MEM_INSTR_DEPTH),
1078
                    .CORE_ID
.USE_INITS
1079
                                         (CORE_ID),
1080
                                         (1),
                                        ("INSTR_MEM")
1081
                    .NAME
1082
               ) mem_instr (
1083
                    .clk
                                         (clk),
                                         (reset),
1084
                    .reset
                    // port 1 .mem_addr
1085
1086
                                         (r_pc),
                                        (0),
(1'b0),
1087
                    .mem_in
                                                  // ROM
1088
                    .mem_we
                                         (w_mem_instr_out)
1089
                    .mem_out
1090
          `endif
1091
1092
               // MMII
1093
               vmicro16_core_mmu # (
1094
                    .MEM_WIDTH
.MEM_DEPTH
1095
                                        (DATA_WIDTH),
                                         (MEM_SCRATCH_DEPTH),
1096
                                        (CORE_ID)
1097
                    .CORE_ID
1098
               ) mmu (
1099
                    .clk
                                         (clk)
                                         (reset),
1100
                    .reset
                                        (r_mem_scratch_req),
(r_mem_scratch_busy),
                    .req
1101
1102
                    .busy
                    // interrupts
1103
                    .ints_vector
1104
                                         (ints_vector),
1105
                     .ints_mask
                                         (ints_mask),
1106
                    // port 1 .mmu_addr
                                        (r_mem_scratch_addr),
(r_mem_scratch_in),
1107
1108
                    .mmu_in
                                         (r_mem_scratch_we),
1109
                    .mmu_we
1110
                     .mmu_lwex
                                         (r_instr_has_lwex),
1111
                    .mmu_swex
                                         (r_instr_has_swex)
                                         (r_mem_scratch_out),
1112
                    .mmu out
                    // APB maste
.M_PADDR
                                         r to slave
1113
                                         (w_PADDR)
1114
1115
                    .M PWRITE
                                         (w_PWRITE),
                                        (w_PSELx),
(w_PENABLE),
                    .M_PSELx
1116
                    M_PENABLE
1117
                    .M_PWDATA
                                         (w_PWDATA),
1118
1119
                     .M PRDATA
                                         (w_PRDATA),
                                        (w PREADY)
1120
                    .M_PREADY
1121
1122
1123
               // Instruction decoder
               vmicro16_dec dec (
1124
1125
                    // input
```

```
1126
                        .instr
                                              (r_instr),
                       // output async
1127
                                              (),
1128
                       .opcode
                                              (r_instr_rsd),
1129
                       .rd
                                              (r_instr_rsa),
(r_instr_imm4),
1130
                       .ra
1131
                        .imm4
                       .imm8
                                              (r_instr_imm8),
1132
                                              (),
(r_instr_simm5)
1133
                       .imm12
1134
                       .simm5
1135
                        .alu_op
                                               (r_instr_alu_op)
                                              (r_instr_has_imm4),
(r_instr_has_imm8),
                        .has_imm4
1136
                       .has_imm8
1137
                                               (r_instr_has_we),
1138
                       .has_we
1139
                        .has_br
                                               (r_instr_has_br)
1140
                        .has_cmp
                                               (r_instr_has_cmp),
                                              (r_instr_has_mem),
(r_instr_has_mem_we),
(w_halt),
1141
                        .has_mem
                        .has_mem_we
1142
1143
                       .halt
                                              (w_intr),
(r_instr_has_lwex),
(r_instr_has_swex)
1144
                        .intr
1145
                        .has_lwex
                       .has_swex
1146
1147
1148
                 // Software registers
vmicro16_regs # (
    .CORE_ID (CORE_ID),
    .CELL_WIDTH (`DATA_WIDTH)
1149
1150
1151
1152
1153
                 ) regs (
                       .clk
1154
                                        (clk),
                       .reset (reset),
// async port 0
.rs1 (r_reg_rs1),
1155
1156
1157
1158
                       .rd1
                                        (r_reg_rd1_s),
                       // async port 1 //.rs2 (,
1159
1160
                       //.rd2
// write port
1161
                                           (),
1162
                                    (r_reg_we && ~regs_use_int),
  (r_instr_rsd),
  (r_reg_wd)
1163
                       .we
1164
                       .ws1
1165
                       .wd
1166
                 );
1167
                 // Interrupt replacement registers 
`ifdef DEF_ENABLE_INT
1168
1169
                 vmicro16_regs # (
.CORE_ID (CORE_ID),
.CELL_WIDTH ( DATA_WIDTH),
.DEBUG_NAME ("REGSINT")
1170
1171
1172
1173
                 ) regs_intr (
1174
1175
                       .clk
                                        (clk),
1176
                       .reset
                                        (reset),
                       // async port 0
.rs1 (r r
1177
                                        (r_reg_rs1),
1178
1179
                       .rd1
                                        (r_reg_rd1_i),
                       // async port 1
//.rs2 (,
//.rd2 (,
1180
                                           (),
1181
1182
                                           (),
                       // write port
1183
1184
                       .we
                                        (r_reg_we && regs_use_int),
                                        (r_instr_rsd),
1185
                       .ws1
1186
                       .wd
                                        (r_reg_wd)
1187
                 );
`endif
1188
1189
                  // ALU
1190
                 vmicro16_alu # (
1191
1192
                       .CORE_ID(CORE_ID)
1193
                 ) alu (
                                        (r_instr_alu_op),
(r_instr_rdd),
(r_instr_rda),
1194
                       .op
1195
                       .a
                       .b
1196
1197
                       .flags
                                         (r_cmp_flags),
                       // async output
.c (r_alu_out)
1198
1199
1200
1201
                 branch branch_check (
1202
                                  (r_cmp_flags),
    (r_instr_imm8),
1203
                       .flags
1204
                       .cond
1205
                                        (w_branch_en)
                       .en
1206
                 );
1207
           endmodule
1208
```