# FPGA-based RISC Microprocessor and Compiler (Rev. 2.60)

PRCO304 - Final Stage Computing Project

**Ben Lancaster 10424877** April 5, 2018

# **Revision History**

Table 1: Document revisions.

| Date       | Version | Changes                                   |  |
|------------|---------|-------------------------------------------|--|
| 30/03/2018 | 2.60    | Add word count below TOC.                 |  |
| 29/03/2018 | 2.50    | Add chapter table of contents.            |  |
| 29/03/2018 | 2.40    | Add section 3.6 Testing and Verification. |  |
| 28/03/2018 | 2.30    | Add section 4.7.1 Variables.              |  |
| 24/03/2018 | 2.20    | Add section 4.7.2 PUSH and POP.           |  |
| 22/03/2018 | 2.10    | Add section 4.5 AST Generation.           |  |
| 15/03/2018 | 2.00    | Add section 4.4 Text Grammar.             |  |
| 11/03/2018 | 1.00    | nitial section outline.                   |  |

# **Abstract**

ben

# **Table of Contents**

| LIS | List of Figures 5                                                                                                                                                                                                                                                                                                                  |                                                                |  |  |  |  |
|-----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------|--|--|--|--|
| Lis | of Tables                                                                                                                                                                                                                                                                                                                          | 6                                                              |  |  |  |  |
| 1   | Embedded Processors  1.1 Introduction 1.2 Background 1.2.1 Current Implementations 1.3 Project Overview 1.3.1 Core Deliverables 1.3.2 Extended Deliverables 1.4 Legal and Ethical Considerations 1.4.1 Privacy 1.4.2 Fit for Purpose 1.4.3 Third-party Libraries 1.4.4 Generated Code                                              | 8<br>8<br>8<br>9<br>9<br>10<br>10<br>10<br>10                  |  |  |  |  |
| 2   | Project Management 2.1 Time Management                                                                                                                                                                                                                                                                                             | 11<br>11<br>11<br>11<br>11                                     |  |  |  |  |
| 3   | PRCO304 Processor Design 3.1 Introduction 3.2 High Level Design 3.3 Registers 3.3.1 General Purpose Registers 3.3.2 Special Registers 3.4.1 Instruction Set Architecture 3.4.1 Instruction Types 3.4.2 Instructions 3.4.3 Conditional Branching 3.4.4 Design Considerations 3.5 Pipeline Architecture 3.6 Testing and Verification | 13<br>13<br>14<br>14<br>15<br>15<br>16<br>16<br>17<br>17<br>18 |  |  |  |  |
| 4   | 4.3 Implementation4.3.1 Program Operation4.3.1 Program Operation                                                                                                                                                                                                                                                                   | 20<br>20<br>20<br>21<br>21<br>21                               |  |  |  |  |

|   |            | 4.4.1 Text Parser                                    |
|---|------------|------------------------------------------------------|
|   | 4.5        | AST Generation                                       |
|   | 4.6        | Optimisation                                         |
|   |            | 4.6.1 Constant Folding                               |
|   |            | 4.6.2 Unreachable Code Elimination                   |
|   | 4.7        | Code Generation                                      |
|   |            | 4.7.1 Variables                                      |
|   |            | 4.7.2 PUSH and POP                                   |
|   | 4.8        | Assembling                                           |
|   |            | 4.8.1 Executable Layout                              |
|   |            | 4.8.2 Limitations                                    |
|   | 4.9        | Testing and Verification                             |
|   | 4.10       | Sub-project Review                                   |
|   |            | 4.10.1 Core Deliverables                             |
|   |            | 4.10.2 Extended Deliverables                         |
| 5 | Post       | t-Project 30                                         |
| • | 5.1        | Project Post-mortem                                  |
|   | • • •      | 5.1.1 Project Objectives                             |
|   |            | 5.1.2 Development Process                            |
|   |            | 5.1.3 Personal Contributions                         |
|   | 5.2        | Conclusion                                           |
| _ | <b>-</b> . |                                                      |
| 6 | Rete       | erences 31                                           |
| 7 | App        | endices 32                                           |
|   | 7.1        | Appendix A. User Guides                              |
|   |            | 7.1.1 PRCO304 Core Reference Guide                   |
|   |            | 7.1.2 PRCO304 Compiler Reference Guide               |
|   |            | 7.1.3 PRCO304 Emulator Reference Guide               |
|   |            | 7.1.4 PRCO304 Processor Instruction Set Architecture |
|   | 7.2        | Appendix B. Project Management Artefacts             |
|   |            | 7.2.1 Project Initiation Document                    |
|   |            | 7.2.2 Highlight Reports                              |
|   | 7.3        | Appendix C. Other Documents                          |
|   |            | 7.3.1 Compiler Functional Requirements               |
|   |            | 7.3.2 Compiler Sequence Diagram                      |
|   |            |                                                      |

# **List of Figures**

| 2.1<br>2.2               | Digilent Arty Artix-7 board                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 12<br>12       |
|--------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|
| 3.1<br>3.2<br>3.3<br>3.4 | PRCO304 processor block diagram showing                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 14<br>18<br>18 |
|                          | Counter (pc); current Op code (q_op); and ALU result (q_result)                                                                                                                                                                                                                                                                                                                                                                                                                                                    | 19             |
| 4.1                      | BNF definition for the input programming language                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 22             |
| 4.2                      | An AST structure representing a parsed function. It contains sub-structures pointing to it's prototype, body, exit statement, and a list of local variables. (ast.h:63)                                                                                                                                                                                                                                                                                                                                            | 23             |
| 4.3                      | UML class diagram showing the AST structures and their connections. The struct ast_i structure is a top level structure that contains pointers to specific AST items (such as ast_func and ast_lvar). It is a self-referencing structure and can be iterated over in a linked-list using it's *next property using the provided macro: list_for_each(). It can be thought as a generic header for each AST type allowing it to be passed as a void* and still identified through it's enum ast_type type parameter | item           |
| 4.4                      | Example of an expression suitable for constant folding                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 24             |
| 4.5                      | Example of an expression the optimiser cannot identify as constant                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 24             |
| 4.6                      | AST transformation performed by Constant Folding                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 24             |
| 4.7                      | Disassembly of the output machine code for the high-level code (4.8)                                                                                                                                                                                                                                                                                                                                                                                                                                               | 25             |
| 4.8                      | Input high-level code showing 3 variable declarations and references                                                                                                                                                                                                                                                                                                                                                                                                                                               | 25             |
| 4.9                      | Example machine code generation for local variables                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 25             |
| 4.10                     | PUSH emulation. The Stack Pointer is subtracted the amount to store on the stack (1                                                                                                                                                                                                                                                                                                                                                                                                                                |                |
| 4.11                     | word), followed by storing the destination register ( <i>rd</i> ) at the new Stack Pointer POP emulation. The value pointed to by the Stack Pointer is loaded in the destination register ( <i>rd</i> ), followed by incrementing the Stack Pointer the size of the data type (1                                                                                                                                                                                                                                   | 26             |
| 4.12                     | word)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 26<br>5).      |
| 4.13                     | to pop stack into Ax                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 26<br>26       |
| 7.2                      | PRCO304 compiler Functional requirements and their technical implementation requirements                                                                                                                                                                                                                                                                                                                                                                                                                           | 57             |
| 7.3                      | UML sequence diagram for the PRCO304 compiler.                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 58             |

# **List of Tables**

| 1   | Document revisions                                                                 | 1  |
|-----|------------------------------------------------------------------------------------|----|
| 3.1 | General purpose registers.                                                         | 14 |
|     | Special registers                                                                  |    |
| 3.3 | Status Register breakdown                                                          | 15 |
| 3.4 | The 2 instruction format types used by the PRCO304 processor                       | 16 |
| 3.5 | Number of respondants for treatment                                                | 16 |
| 3.6 | Conditional jump immediate bits                                                    | 17 |
| 4.1 | Compiler Core Deliverables Review                                                  | 28 |
| 4.2 | Compiler Extended Deliverables Review                                              | 29 |
| 7.1 | Conditional jump immediate bits                                                    | 40 |
| 7.2 | Initial Project Plan time breakdown *Expected time. Shaded stages are time varying |    |
|     | periods for bug fixing.                                                            | 14 |
| 7.3 | Initial Quality Plan                                                               |    |

### **Word Count**

Words in text: 4519 Words in headers: 116

Words outside text (captions, etc.): 213

Number of headers: 60

Number of floats/tables/figures: 19

Number of math inlines: 0 Number of math displayed: 0

# **Embedded Processors**

| 1.1 | Introduction                     | 8  |
|-----|----------------------------------|----|
| 1.2 | Background                       | 8  |
|     | 1.2.1 Current Implementations    | 8  |
| 1.3 | Project Overview                 | S  |
|     | 1.3.1 Core Deliverables          | 9  |
|     | 1.3.2 Extended Deliverables      | 9  |
| 1.4 | Legal and Ethical Considerations | 10 |
|     | 1.4.1 Privacy                    | 10 |
|     | 1.4.2 Fit for Purpose            | 10 |
|     | 1.4.3 Third-party Libraries      | 10 |
|     | 1.4.4 Generated Code             | 10 |

#### 1.1 Introduction

Modern computing and electronics equipment, like function generators, oscilloscopes, and spectrum analysers, use FPGAs to implement their compute intensive logic. These FPGAs are often accompanied by a small, low-cost, microprocessor to supervise and provide interfaces to external peripherals.

The aim of this project is to implement this side-microprocessor into the FPGA to save on BOM costs, PCB space, and power costs, which contribute to higher development and product costs. While savings can be made by the lack of side microprocessor, the product may need a larger FPGA to accommodate the embedded microprocessor. The project will produce a small, soft-core, CPU design and compiler. Although there is no direct client in this project, I believe this project will produce an attractive product for FPGA-based product designers wishing to employ an embedded processor solution.

# 1.2 Background

#### 1.2.1 Current Implementations

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget, consectetuer id, vulputate a, magna. Donec vehicula augue eu neque. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci dignissim rutrum.

## 1.3 Project Overview

This project aims to provide an efficient and cost-saving alternative for board and hardware product designers utilising side-microprocessors by designing, implementing, and demonstrating, a small, portable, FPGA processor core design to be used in-place of the side-microprocessor.

The processor core will implement it's own processor and instruction set architecture and so a compiler and assembler will also be provided so that software code can easily be executed on the processor.

#### 1.3.1 Core Deliverables

These core (C) deliverables are the base requirement for the project to be released in a functional and worthwhile state.

- C1. To improve my knowledge and experience of FPGA development, processor architecture, compilers, and embedded systems engineering.
- C2. To build a working and operational soft-core processor core capable of performing simple tasks.
- C3. Implementation of the soft-core processor design on real hardware.
- C4. To provide product designers with an affordable alternative to a side-microprocessor in their FPGA-based products.
- C5. To provide a technical processor reference guide and specification for the embedded core.

#### 1.3.2 Extended Deliverables

These extended (E) deliverables may not be achievable in the time frame specific in section 2.1 as they may require extra time to design and implement, require more experience or skill, or require resources currently unattainable.

- E1. To provide embedded products a convenient solution to in-field updating.
- E2. To provide easy interfacing between the FPGA design and the embedded core.
- E3. GCC/LLVM/8CC compiler backend for C programming.
- E4. Wishbone interface for easier modularity and inter-module communication.
- E5. Multi-core design with Wishbone (2).
- E6. Configurable build options (register/bus widths, optimisations/pipelining, user/privileged mode to support modern operating systems).
- E7. Memory management modules to provide protected and virtual memory lookup tables.

# 1.4 Legal and Ethical Considerations

### 1.4.1 Privacy

The PRCO304 processor will be able to read and write to all data passing through it and control all connected peripherals (such as UARTs, SDRAMs, and SD Cards). The processor does not track or store usage behaviour, instructions and their frequency, memory contents, or timing statistics, or any other usage metric.

#### 1.4.2 Fit for Purpose

The PRCO304 processor is **not** designed to run general purpose operating systems, such as Linux or embedded RTOS systems. All memory devices attached to the FPGA are fully accessible to the processor core and instructions/programs running through it, meaning that operating systems or secure applications storing private and sensitive information is not protected by modern processor features such as privilege modes and virtual memory sections. The processor lacks common components required to run modern operating systems, such as a memory management unit (MMU) and privilege modes, and so **should not be run on the processor**.

The PRCO304 processor is **not** designed to run in high-reliability or safety-critical environments that require established safety standards, such as the UK Defence Standard 00-56 (Bowen and Stavridou, 1993) and IEC 61508 (Bell, 2006).

The PRCO304 processor is **not** designed for implementation in silicon and makes no guarantees of reliability or performance in this format.

The PRCO304 processor, by design, should be used as a replacement for a simple micro-controller accompanying a main processing module.

## 1.4.3 Third-party Libraries

This project uses only 1 external library for the processor core's universal asynchronous receiver-transmitter (UART) module that does not depend on any other libraries. This allows me to guarantee that: the project rights are secure; and application behaviour is well-defined and predictable (no exploits introduced/injected from external libraries). The UART module does feature a large first-in-first-out (FIFO) buffer for temporary storage of in- and out- going messages. This FIFO is internal to the FPGA design and so is protected from external viewing/modification by probing the board in which the core is running on.

The compiler sub-project does not use any external library dependencies, does not record telemetry or usage statistics, and does not require an internet connection to run.

#### 1.4.4 Generated Code

The code generated by the compiler is **not guaranteed** to:

- Produce code for secure environments. The compiler will not randomise, obfuscate, or split-up and spread, output code. Output machine code will be in a predictable format (global variables in low-memory, instruction memory in middle-memory, and stack memory in highmemory) making the binary easily subject to reverse-engineering and modification.
- **Produce constant time executable code for expressions**. For example, the compiler output for an *if* statement may implicitly vary depending on it's condition expression, which may have been optimised out, constant-folded, or without-optimisation. This also applies for user code aiming to create reliable and accurate time delay loops; although the processor does not perform optimisations such as instruction caching or branch prediction, access to memory and ALU operations may vary in time, resulting in unreliable instruction times.

# **Project Management**

| 2.1 | Time Management            |
|-----|----------------------------|
|     | Version Control            |
| 2.3 | Method of Approach         |
| 2.4 | Requirements               |
| 2.5 | Resources and Dependencies |

## 2.1 Time Management

## 2.2 Version Control

Version control will be utilised to improve work-flow, reference and review code changes, and protect the project from data loss and corruption. GitHub, a git hosting provider, will be utilised to host all project files, including documentation and design files.

The repository can be found here: https://github.com/bendl/prco304.

# 2.3 Method of Approach

Development of the **core** and **compiler** will be done in separate stages of the project (see section 2.1). The two deliverables will be split into 2 sub-projects. Both sub-projects will employ the **Agile development process**, using Agile's sprints to split up tasks into sub-tasks and Agile's scrums to discuss progress, features, and changes. This technique allows revisiting of tasks to tweak and iterate over their implementation which will be key when for incrementally adding features to both sub-projects, for example, adding to the core's ALU module to add conditional branching, or adding new instructions to the core's decoder module.

# 2.4 Requirements

# 2.5 Resources and Dependencies

For the first half of the development cycle, the core can be developed and verified using the Verilog simulator and test suite, **Verilator**, and VHDL and Verilog simulator, **iSim**.

The second half of development will require deploying and debugging on real hardware. This will require an FPGA development kit. To better emulate customer products, the development kit should feature common components such as LEDs, GPIO, USB interface, flash-based storage and memory, and optionally an analogue audio output port. The low-middle range of FPGA devices the project is targeting is the popular and affordable yet feature rich Spartan-6 and Artix-7 FPGAs. From my placement, I have gained experience in Xilinx FPGAs and so will be targeting them for this project to reduce risk and development time.

The following FPGA development kits are suitable for this project:

MiniSpartan6+ - Scarab Hardware - \$79 (already owned) (MiniSpartan6+, 2014). The MiniSpartan6+ features a Spartan-6 XC6SLX9 FPGA, 8 LEDs, 2 digital and analogue headers, FT2232 FTDI USB to JTAG, 64Mb SPI flash memory, 32MB SDRAM, an audio output jack, and a MicroSD socket.



Figure 2.1: Scarab Hardware MiniSpartan6+ board layout.

2. Arty Artix-7 FPGA Development Board - Digilent - \$100 (Arty Artix-7 FPGA Development Board, 2015). The Arty development board features a larger Artix-35T FPGA with over 20x the number of logic cells and block memory compared to the LX9 in the MiniSpartan6+. The board components include 256MB DDR3 RAM, 16MBx4 SPI flash memory, USB-JTAG, 8 LEDs (4 of which are RGB), 4 switches, 4 buttons, and multiple Pmod connectors.

The greater number of IO options and larger FPGA make the Arty board better suited to emulating real customer products.



Figure 2.2: Digilent Arty Artix-7 board.

The project will require a computer or laptop to develop the core and compiler on and continuous integration systems to perform testing on the incremental builds. For the project demo, an oscilloscope (already owned) or digital logic analyser may be required to demonstrate some of the core's features.

# **PRCO304 Processor Design**

| 3.1 | Introduction                    | 13 |
|-----|---------------------------------|----|
| 3.2 | High Level Design               | 13 |
| 3.3 | Registers                       | 14 |
|     | 3.3.1 General Purpose Registers | 14 |
|     | 3.3.2 Special Registers         | 15 |
| 3.4 | Instruction Set Architecture    | 15 |
|     | 3.4.1 Instruction Types         | 16 |
|     | 3.4.2 Instructions              | 16 |
|     | 3.4.3 Conditional Branching     | 16 |
|     | 3.4.4 Design Considerations     | 17 |
| 3.5 | Pipeline Architecture           | 17 |
| 3.6 | Testing and Verification        | 18 |

### 3.1 Introduction

The PRCO304 Processor Design is the first of two deliverable sub-projects required for this project. The processor is designed to be a small, instantiated, Verilog module that can be easily inserted into existing FPGA-based Verilog projects.

The processor core is not designed for physical implementation in silicon but rather for FPGA devices.

# 3.2 High Level Design

The PRCO304 processor is a modularised processor with independent components for the ALU, Registers, RAM, and it's peripherals.



Figure 3.1: PRCO304 processor block diagram showing.

# 3.3 Registers

PRCO304 has a total of 6 addressable, read and write, registers. These registers are identified by letters A through H.

## 3.3.1 General Purpose Registers

Registers A through E are designed for general purpose use and are safe to store user values over the run-time of the processor.

Table 3.1: General purpose registers.

| Registers   | Bits | Description                 |
|-------------|------|-----------------------------|
| A through E | 15:0 | 5 General purpose registers |

Instructions that require a destination register, such as CMP, can reference any register (even special registers if that is your requirement). For the CMP instruction as an example, the processor will put the result of the comparison instruction in the destination register, overwriting any value present in that register.

### 3.3.2 Special Registers

Registers F through H are special registers within the processor. The processor cannot guarantee that a value written or read in these registers will persist over the run-time of the processor. Erroneously writing to these registers may severely affect program and processor behaviour.

Even though all registers can be used at the will of the programmer, it is recommended to isolate a few registers to provide special features, such as RAM stack management, interrupts, and IO multiplexing.

Registers Bits Description

F 15:0 Status Register

G 15:0 Base Pointer

H 15:0 Stack Pointer

Table 3.2: Special registers.

#### **Status Register**

The Status Register is a dedicated register used by the ALU to provide additional information on results of instructions. Using the Status Register is essential for programs wanting to perform conditional branching or operate on dynamic data.

| Bit | Name | Description                                             |
|-----|------|---------------------------------------------------------|
| 0   | SR_Z | Set if the result of a CMP instruction is 0.            |
| 1   | SR₋E | Set if the two operands of a CMP instruction are equal. |
| 2   | SR_S | Set if operand B is greater than operand A.             |

Table 3.3: Status Register breakdown.

By default, the JMP instruction will read the Status Register to compare against the instruction's conditional jump parameter.

#### **Base Pointer**

The PRCO304 processor assumes that the compiler will employ a stack management scheme similar to that of x86 machines. By doing so, the compiler assumes the last 2 registers are dedicated to stack management. The Base Pointer register is used in a similar way to the x86 Base Pointer register.

Compilers and code generators should utilise this register for storing the address of the current stack frame. By utilising the register this way, features such as local and passed variables become available as they are addressable by offsetting the Base Pointer by a constant value.

#### **Stack Pointer**

The Stack Pointer is similar to the x86 Stack Pointer in that it stores the address of the top of the stack. This register is used primarily for PUSH and POP operations (see section 4.7.2 PUSH and POP for example usage).

### 3.4 Instruction Set Architecture

The PRCO304 processor implements it's own fixed 16-bit instruction set.

## 3.4.1 Instruction Types

Table 3.4: The 2 instruction format types used by the PRCO304 processor.

| Туре   | Bits  |      |     |     |  |
|--------|-------|------|-----|-----|--|
| Type 1 | 15-11 | 10-8 | 7-5 | 4-0 |  |
| Type 2 | 15-11 | 10-8 | 7-  | -0  |  |
| Type 3 | 15-11 |      | 7-0 |     |  |

### 3.4.2 Instructions

Table 3.5: Number of respondants for treatment

| Type 1 | 15-11 | 10-8 | 7-5  | 4-0   | Semantics                                                                        |
|--------|-------|------|------|-------|----------------------------------------------------------------------------------|
| Type 2 | 15-11 | 10-8 | 7-0  |       | Semantics                                                                        |
| Type 3 | 15-11 |      | 7-0  |       | Semantics                                                                        |
| NOP    | 00000 | Х    | Χ    | Х     | PC <= PC + 1                                                                     |
| LW     | 00001 | Rd   | Ra   | Simm5 | Rd <= RAM[Ra + Simm5]                                                            |
| SW     | 00010 | Rd   | Ra   | Simm5 | RAM[Ra + Simm5] <= Rd                                                            |
| MOV    | 00011 | Rd   | Ra   | Х     | Rd <= Ra                                                                         |
| MOVI   | 00100 | Rd   | S    | imm8  | Rd <= Simm8                                                                      |
| ADD    | 01000 | Rd   | Ra   | Х     | Rd <= Rd + Ra                                                                    |
| ADDI   | 01001 | Rd   | S    | imm8  | Rd <= Rd + Simm8                                                                 |
| SUB    | 01010 | Rd   | Ra   | Х     | Rd <= Rd - Ra                                                                    |
| SUBI   | 01011 | Rd   | S    | imm8  | Rd <= Rd - Simm8                                                                 |
| JMP    | 01100 | Rd   | I    | mm8   | See Conditional Branching.                                                       |
| СМР    | 01101 | Rd   | Ra   | Rb    | Set SR flags                                                                     |
| HALT   | 10010 |      | X    |       | Stop the processor.                                                              |
| WRITE  | 10011 | Rd   | lmm8 |       | See ??                                                                           |
| READ   | 10100 | Rd   | I    | mm8   | See ??                                                                           |
| SETC   | 10101 | Rd   | lmm8 |       | Set Rd to 1 if Imm8 is set in Status Register from last CMP instruction, else 0. |

## 3.4.3 Conditional Branching

Table 3.6 below details each conditional branch parameter and how it is evaluated in the Status Register.

15-11 7-0 10-8 Semantics Status Register **JMP** 01100 0000 0000 Rd **Unconditional Jump** Any JE 01100 0000 0001 Rd Jump Equal ZF=1 **JNE** 01100 Rd 0000 0010 Jump Not Equal ZF=0 JG 01100 Rd 0000 0011 Jump Greater Than ZF=0 and SF=OF **JGE** 01100 0000 0100 Rd Jump Greater Than or Equal SF=OF JL 01100 Rd 0000 0101 Jump Less Than SF<>OF JLE 01100 Rd 0000 0110 Jump Less Than or Equal ZF=1 or SF<>OF JS 01100 Rd 0000 0111 SF=1 Jump Signed **JNS** 01100 0000 1000 Rd Jump Not Signed SF=0

Table 3.6: Conditional jump immediate bits

### 3.4.4 Design Considerations

The PRCO304 processor's ISA has been through multiple iterations.

#### **Opcode Bits**

Initially, the opcode length was 4-bits allowing a total of 16 unique opcodes. This was later changed to 5-bits to add more opcodes, however

#### **SETC Instruction**

The SETC instruction was added to reduce the number of instruction required to perform boolean logic operations on registers. Without the SETC instruction, to evaluate the expression 1 < 5, the compiler would need to emit multiple JMP instructions to set the result to 0 or 1 and JMP over the other result. In my testing, the compiler would require between 5-8 instruction for each boolean expression.

The SETC instruction is inspired by the x86 instruction: SETcc Set Byte on Condition (Shanley, 2010).

With the introduction of the SETC instruction late in development, the number of instructions could be reduced to around two instruction. (one for the initial comparison, and one for setting 1 or 0 with SETC). This greatly improved program execution time and size.

if (1 < 5) {

# 3.5 Pipeline Architecture

The PRCO304 processor employs a *feed-forward* pipeline strategy. This pipeline supports:

- Time-varying processes: Multi-clock cycle decoding; Memory access; ALU ops.
- Module re-ordering: Instruction dependencies; Module skipping; Output redirection.
- Interruption (see section ??: ??).

As the pipeline is feed-forward, no information is sent back to previous modules to tell them of their status. This means that if a module is stalled (due to mutli-cycle processes or future modules are stalled), and the previous module is ready, the previous module will signal the next module that information is ready and it should take it, but the current module is unable to as it is busy. The pipeline

resolves this issue by it's cyclic nature. This means that only 1 module at any time is processing data. Of-course, the downside to this approach is that instruction parallelism is reduced.



Figure 3.2: The feed-forward pipeline interconnect diagram used by the PRCO304 processor.

The pipeline structure is described in figure 3.2 (above). The general order of the modules is shown from left to right, but this can change due to the pipelines re-ordering functionality.

The Decoder module will decode instruction words from memory and will output appropriate signals containing the requirements of the instruction, such as requiring register write access, any ALU operation, and whether the instructions requires access to internal/external memory.

To improve instruction performance, the decoder can also choose what modules are required and when they are called. For example, for the (move immediate) instruction the Decoder will assign the following modules in the following order: ALU and Register write, resulting in a total of 5 stages (including PC, Fetch, and Decode). The last module in this pipeline, the Register write, will raise the q-pipe-end signal indicating that the pipeline has finished and to start fetching the next instruction.

For the instruction, the decoder identifies that the instruction requires no dependencies and will hence signal the  $q_pc_inc$  signal resulting in only 3 pipeline stages.

For instructions that require RAM access, a typical pipeline order might look like: PC, Fetch, Decoder, Register Read, ALU, RAM, resulting in 6 stages being used.



Figure 3.3: PRCO304 processor instruction cycle time diagram.

# 3.6 Testing and Verification

Each module within the PRCO304 processor has a corresponding Verilator and iSim testbench.

• **Verilator testbenches** are used to automatically verify correct behaviour of the RTL code. These testbenches use the Verilator framework to compile and simulate Verilog modules.

These tests produce an output report detailing test results and real register values. The Verilator test benches used in this project are found in prco\_core/rtl/test/verilator and can be run using the script: make\_test.sh.

Running test: ALU OR 2 ALU\_OP\_WRITE/READ 000a

PASS: 10 10

Running test: ALU OR 3 ALU\_OP\_WRITE/READ 0004 FAIL: Got 4 Expected 7

\_\_\_\_\_

14/27 tests passed.

An example test report for the ALU running OR instructions on different operands and immediate values is shown above.

• **iSim testbenches** are used to better visualise signal states and changes over time. These testbenches require manual verification and so it can take a considerable amount of time to verify a module.



Figure 3.4: iSim simulation showing high-level signals in the processor core, including: Program Counter (pc); current Op code (q\_op); and ALU result (q\_result).

• **Single-step implementation runs** are used to verify the correct behaviour of the RTL code on a real FPGA device.

The PRCO304 processor core features a single-step input line that can be pulsed to signal the core to execute the next instruction. In these tests, generally the first register Ax is redirected to the 8 LEDs on the development board, allowing the tester to visually see it's contents. However, only the higher-or-lower byte can be viewed at any single time (as registers are 16-bits wide). UART printing is also used to visualise register contents, however, integer to ASCII conversion is not implemented so only single digits can be displayed in ASCII.

# **PRCO304 Compiler**

| 4.1  | Introduction                       | 20 |
|------|------------------------------------|----|
| 4.2  | Functional Requirements            | 20 |
| 4.3  | Implementation                     | 21 |
|      | 4.3.1 Program Operation            | 21 |
| 4.4  | Text Grammar                       | 21 |
|      | 4.4.1 Text Parser                  | 22 |
| 4.5  | AST Generation                     | 22 |
| 4.6  | Optimisation                       | 23 |
|      | 4.6.1 Constant Folding             | 24 |
|      | 4.6.2 Unreachable Code Elimination | 24 |
| 4.7  | Code Generation                    | 25 |
|      | 4.7.1 Variables                    | 25 |
|      | 4.7.2 PUSH and POP                 | 25 |
| 4.8  | Assembling                         | 26 |
|      | 4.8.1 Executable Layout            | 26 |
|      | 4.8.2 Limitations                  | 26 |
| 4.9  | Testing and Verification           | 27 |
| 4.10 | Sub-project Review                 | 28 |
|      | 4.10.1 Core Deliverables           | 28 |
|      | 4 10 2 Extended Deliverables       | 29 |

#### 4.1 Introduction

The PRCO304 compiler is the second of two sub-project deliverables for this project.

The PRCO304 compiler is a command line based software tool used to convert a high-level text grammar (a programming language) into executable machine for the PRCO304 processor.

The compiler is invoked with parameters for the input code file and optional parameters specifying optimisation level, target architecture, verbosity, output file name, and include directory paths. The full command line parameter list can be found in .

# 4.2 Functional Requirements

This section details the functional requirements (F) and their technical dependencies of the compiler to allow users to produce complete and functional programs. Figure 7.2 breaks down each functional requirement to show their technical dependencies.

F1. **Text Components.** The compiler will be able to parse the programming language's grammar's (see section Text Grammar 4.4) terminals into distinct groups, such as text strings, arithmetic symbols, and other text symbols.

- F2. **Program flow manipulation.** The compiler will support divergent and branching program structures using unconditional and conditional jump instructions.
- F3. **User-defined values.** The compiler will support the creation of user-defined variables allowing the user to read and write values at their will.
- F4. **User-defined value manipulation.** The compiler will allow the user to modify user-defined variables during program execution.
- F5. **User-defined program flow.** The compiler will allow the user to control program divergence and repetition through the use of control statements (if and for statements).
- F6. **User-defined functional program.** The compiler will allow the combination of the above features to produce a complete and functional sequence of instructions ready for execution.
- F7. **User-defined encapsulated program.** The compiler will support encapsulating user-defined programs into functions to improve program control and support more complex programs.

For example, to support 'F4 User-defined program flow', the compiler needs to support control sequences such as 'for' and 'if' statements which themselves require implementation of 'variables' and 'conditional branching', and so on.

## 4.3 Implementation

The compiler is implemented fully in the ANSI C programming language due to my familiarity and experience in the language. The compiler is self-contained and requires no dependencies other than the standard C library and CMake to build the project. The project strictly follows the ANSI C89 style guide to make the code more readable and is compiled with -Wall -Wextra -Wno-comment.

## 4.3.1 Program Operation

The program flow of the PRCO304 compiler is detailed in Compiler Sequence Diagram 7.3.2.

#### **Building the Compiler**

To build the compiler, run the following commands:

```
cd prco304
mkdir build && cd build
cmake ..
cmake --build .
```

If you wish to build the compiler's own standard library run the following command as root/administrator to install the sources and header files:

```
cmake --build . --target install
```

#### 4.4 Text Grammar

The input to the compiler is a generic programming language similar to C.

```
def main() {
    int a = 0;
}
```

The grammar is defined below in Backus-Naur Form:

```
::= [a-zA-Z]+[0-9]*
<word>
        ::= """ <word> """
<string>
<number> ::= [0-9]+
        ::= <func_def>|<decl>|<extern>
<top>
<func_def> ::= <prot>>dody>
<body>
        ::= "{" <primary> "}"
<decl> ::= <word> "=" <expr>
<control> ::= <if>|<for>|<while>
<if> ::= "if" "(" <expr> ")" <body>
<for>
        ::= "for" "(" <expr> <expr> <expr> ")" <body>
<expr> ::= <assign>|<binop>|<number>|<string>|"("|")"
<assign> ::= <word> "=" <expr>
<binop>
        ::= "+"|"-"|"*"|"/" <expr>
```

Figure 4.1: BNF definition for the input programming language.

It should be noted that the grammar and compiler do not have any terminals for defining datatypes, such as "short" and "int". This is because there is only one datatype supported by both compiler and processor. This is due to the complexity required to support different sized datatypes, for example, calculating how many 16-bit words to allocate on the stack for local parameters and accessing them through offsets is difficult and out of scope.

#### 4.4.1 Text Parser

The compiler implements it's own recursive descent parser for the grammar described in 4.4. The parser is able to recognise all context free grammars and therefore would be capable of parsing more complete programming languages such as C and Python.

The text parser is inspired by Jack Crenshaw's "Let's Build a Compiler" book, (Crenshaw, 1988).

While parser generators already exist, such as Bison and Java's ANTLR, it was decided to implement the parser by hand using recursive descent principles as a matter of learning rather than ease of use. Although parsing a more complex grammar would easily be more achievable using a parse generator, the overhead of generating compliant assembly for that complex grammar would be too time consuming and is hence out of scope (see extended deliverable E3.).

### 4.5 AST Generation

The recursive descent parser stores all terminals in the grammar as structures in *ast.h* containing relocatable information about the parsed text and it's future implementation. This AST result of the text parser is the initial immediate representation used by the compiler.

```
struct ast_func {
    struct ast_proto *proto;
    struct ast_item *body;
    struct ast_item *exit;
    struct list_item *locals;
    struct ast_func *next;
    int num_local_vars;
};
```

Figure 4.2: An AST structure representing a parsed function. It contains sub-structures pointing to it's prototype, body, exit statement, and a list of local variables. (ast.h:63)



Figure 4.3: UML class diagram showing the AST structures and their connections. The struct ast\_item structure is a top level structure that contains pointers to specific AST items (such as ast\_func and ast\_lvar). It is a self-referencing structure and can be iterated over in a linked-list using it's \*next property using the provided macro: list\_for\_each(). It can be thought as a generic header for each AST type allowing it to be passed as a void\* and still identified through it's enum ast\_type type parameter.

# 4.6 Optimisation

The PRCO304 compiler can optionally perform simple optimisations, such as unreachable code elimination and constant folding. The optimisations can be controlled by specifying the -0n parameter to the CLI, where n is the level of optimisation.

The techniques used by the optimiser to perform these optimisations are primitive; the optimiser is not given AST information in SSA (static single assignment) form; and because of this limitation, only basic optimisations can be identified.

#### 4.6.1 Constant Folding

Constant folding is performed by the optimiser to reduce (fold) expressions that can be identified as constant. This allows the optimiser to replace AST tree structures containing constant values and no dependencies with shorter and simpler AST items. This optimisation can drastically improve the performance of the output code by reducing the number of instructions emitted.

For example, the following expression in Figure 4.4 can be identified as constant and can be reduced to a single AST node as shown in Figure 4.6. As the optimiser is not passed AST information in SSA form, the optimiser cannot follow or track variable references and modifications throughout the life-cycle of the program. Although the parser does a contain a primitive symbol table, the symbol table does not map variables to values, and so the code segment in Figure 4.5 cannot be identified as constant by the optimiser.

```
int a = 1 + (2 + 3) * 4;
```

Figure 4.4: Example of an expression suitable for constant folding.

```
int a = 1;
int b = 2;
int c = a + b;
```

Figure 4.5: Example of an expression the optimiser cannot identify as constant.



Figure 4.6: AST transformation performed by Constant Folding.

#### 4.6.2 Unreachable Code Elimination

Unreachable code elimination is the removal of code that will never be run on the processor. This can be in the form of uncalled function, unused variables, and control statements that operate on constant values.

The PRCO304 compiler can identify some unreachable code segments, such as control statements that operate on constant values, by utilising it's constant folding optimisation discussed previ-

ously. By first running the constant folding optimisation on the body of functions, the optimiser looks at the conditions of *if* statements. If it's condition has been constant folded to a constant and is *true* (i.e. not 0) then the AST tree can be replaced with the items in it's body, effectively removing the condition check if it's always true, or the whole structure if it's false.

#### 4.7 Code Generation

The compiler Code Generation stage converts the optimised AST into an intermediatary list of struct prco\_op\_struct. It does this by iterating over each struct ast\_item in the AST and checks whether the item requires code generation. For example, an struct ast\_item with type AST\_FUNC is one which requires code generation. The AST is then passed to the void cg\_func\_prco(...) function where the code generation takes please. For this type, the stack frame header is generated first, before the body of the function. At the end of the function's body, the stack frame end code generation routing is run.

This code generation stage is named intermediatary because absolute addresses of JMP instructions have not been calculated. The calculation of these addresses is performed in the following Assembling stage. In addition, the location (and offset's) of functions may need to be rearranged.

#### 4.7.1 Variables

The PRCO304 compiler supports three types of variables in the high-level code: global variables (variables **declared** outside of functions); local variables (variables declared at the start of function bodies); and as function arguments.

 Similar to C89, all local variables must be declared at the start of the function before any logic, such as function calling. This is because the compiler will not rearrange the AST tree to move variable declarations to the first child of the function AST tree.

When a local variable is declared, stack space is immediately allocated for the variable by subtracting the data type size (1 word) from the Base Pointer variable. Although the code generator knows how many local variables are in a function, due to time constraints, it will not reduce/fold multiple stack allocations into a single SUBI instruction. The output machine code looks similar to Figure 4.9 below.

```
VOM
      %Bp, %Sp
                   1ee0 (STACK FRAME)
                                            def main() {
                   5f01 (ALLOC a -3)
SUBI %Sp, $1
                                                int a; int b;
SUBI %Sp, $1
                   5f01 (ALLOC b -2)
                                                int c;
SUBI %Sp, $1
                   5f01 (ALLOC c -1)
                                                a; b;
LW
      %Ax, -3(%Bp) 08dd (REF a -3)
                                                с;
      %Ax, -2(%Bp) 08de (REF b -2)
LW
                                            }
LW
      %Ax, -1(%Bp) 08df (REF c -1)
```

**Figure 4.8:** Input high-level code showing 3 variable declarations figure 4.7: Disassembly of the output machine code for the high-and references. level code (4.8).

Figure 4.9: Example machine code generation for local variables.

Variables are then accessed using the LW instruction and passing a 5-bit signed immediate constant as seen above.

TODO

### 4.7.2 PUSH and POP

Due to limitations of the PRCO304 processor's instruction set, high-level instructions such as PUSH and POP cannot be performed in a single instruction. Instead, the compiler is able to replicate the

behaviour of these high level instructions by emitting multiple primitive instructions. Figure 4.12 below details how the compiler emulates these high-level instructions.

```
void cg_push_prco(enum prco_reg rd)
{
    asm_push(opcode_add_ri(Sp, -1));
    asm_push(opcode_sw(rd, Sp, 0));
    asm_comment("PUSH");
}

void cg_pop_prco(enum prco_reg rd)
{
    asm_push(opcode_lw(rd, Sp, 0));
    asm_push(opcode_lw(rd, Sp, 0));
    asm_comment("POP");
    asm_push(opcode_add_ri(Sp, 1));
}
```

**Figure 4.10:** PUSH emulation. The Stack Pointer is subtracted the **Figure 4.11:** POP emulation. The value pointed to by the Stack amount to store on the stack (1 word), followed by storing the desti- Pointer is loaded in the destination register (*rd*), followed by increnation register (*rd*) at the new Stack Pointer. menting the Stack Pointer the size of the data type (1 word).

**Figure 4.12:** PUSH and POP emulation functions used by the PRCO304 compiler (*arch/prco\_impl.c:255*). Example of use: cg\_push\_prco(Ax) to push register Ax to the stack; cg\_pop\_prco(Ax) to pop stack into Ax.

## 4.8 Assembling

The final stage of the compiler is the assembling stage. This stage takes the list of struct prco\_op\_struct
and outputs a list of machine code instructions. The assembler accomplishes this by calculating offsets and addresses of functions, branching instructions, and global variable addresses. It may also
rearrange function locations so that the main function is the first instruction to be emitted.

Assembling code is found in assembler\_labels() at arch/template\_impl.c:38.

### 4.8.1 Executable Layout

Another role of the assembler in the PRCO304 processor is to output the machine code in a format that allows the widest range of programs to be run by the processor.

This format is not enforced by the processor core and it's up to the compiler to lay out the processor's memory contents. The only feature that the processor states is that it will start reading instructions from address  $0 \times 00$ . The compiler uses this information to structure the output program. The first two words of memory  $(0 \times 00$  and  $0 \times 01$ ) contain MOVI and JMP instructions to jump the processor to the address of the main() function.

#### 4.8.2 Limitations

Due to time constraints, the assembler introduces many constraints to the output program that are not explicitly identified in the high-level code.

The most prominent limitation is that the assembler can only address 255 words of memory. This is because the assembler only builds up instruction addresses using a single MOVI instruction, which is limited to an 8-bit immediate. This is easily fixable as the assembler could insert additional instructions to build up 16-bit addresses to use. For example, to build a 16-bit address, *0xFECA*, the following instructions could be used:

```
MOVI $0xFE, %Ax
LSHF %Ax, $8
ORI %Ax, $0xCA
JMP %Ax, JE_UC (unconditional)
```



Figure 4.13: PRCO304 memory layout.

## 4.9 Testing and Verification

Verifying the output assembly is a bit more involved as there are multiple layers of tests required. The output code generation must be tested for:

- (A) Correct instruction and machine code building.
- (B) Correct instruction sequences for different code generation routines.
- (C) Correct and complete flow of the output program and any divergent paths.

#### **Unit Testing**

For (A), a code generation routine refers to the code generation function used to produce machine code for a specific structure, for example a function or assignment expression. When machine code instructions are emitted from the code generation routines, they are pushed to a list of struct prco\_op\_struct containing information about the emitted instruction. Using this information, the final output machine code word (e.g. 0x2020) is rebuilt into an equivalent struct prco\_op\_struct structure and compared against the original. If they are the same, the encoded machine code word is considered correct. This check happens every time an instruction is emitted from the code generation routines.

#### **Real Hardware Tests**

The *best* approach to verifying output machine code is to run it directly on the PRCO304 processor. However, this requires rebuilding the FPGA design with the new program code which is time consuming and not always practical. In addition, viewing of internal registers and signals is much more difficult due to the lack of a debugging interface on the processor.

#### **Emulation**

During the later stages of the project, it was decided to build a software emulator for the PRCO304 processor. The emulator, found at prco\_compiler/emu/emu.c, would utilise structures used in the compiler's assembling stage to rebuild instructions and their contents from raw machine code words. Due to the late development of the emulator, and it not being a deliverable, and only developed as an alternate means to test the processor, the emulator is not a full emulation of the processor. The emulator implements most features of the processor, including registers, memory, ALU operations, and most instructions. It is capable of emulating all the test programs (found in proc\_compiler/tests/\*.prcp) except the control\_for\_\* programs.

If extra time was allowed, the emulator could be integrated with the unit tests. This would allow for the dynamic compilation of high-level code test programs and analysing their behaviour during emulation against expected values. This would greatly speed up compiler code generation development times.

# 4.10 Sub-project Review

## 4.10.1 Core Deliverables

Table 4.1: Compiler Core Deliverables Review

| Deliverable                     | Implemented | Version | Comment                                                 |
|---------------------------------|-------------|---------|---------------------------------------------------------|
| CFG Text Parser                 | Yes         | 1.00    | Recursive descent parser.                               |
| AST Intermediate Representation | Yes         | 1.00    |                                                         |
| Basic arithmetic Operators      | Yes         | 1.10    |                                                         |
| IF statements                   | Yes         | 1.30    | if ( <expr> ) { }</expr>                                |
| IF ELSE statements              | Yes         | 1.30    | if ( <expr> ) { } else { }</expr>                       |
| FOR statements                  | Yes         | 1.30    |                                                         |
| Functions                       | Yes         | 1.20    |                                                         |
| Variables                       | Limited     | 1.50    | Only local variables. No global or parameter variables. |
| Code generation                 | Yes         | 1.30    |                                                         |
| Assembling                      | Yes         | 1.40    | Basic offset calculation and re-encoding.               |

## 4.10.2 Extended Deliverables

Table 4.2: Compiler Extended Deliverables Review

| Deliverable                                      | Implemented | Version | Comment                                                                                                                                                                                                                         |
|--------------------------------------------------|-------------|---------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Assembly text parsing                            | No          | X       | Only the CFG text parser for the grammar described in section 4.4 is present. This would be a desirable feature for implementing features the compiler is not able to.                                                          |
| Optimisation: Constant Folding                   | Yes         | 2.00    | Limited to compile time constants. See prco_compiler/libprco/opt.c:9.                                                                                                                                                           |
| Optimisation:<br>Unreachable-code<br>elimination | Yes         | 2.00    | Limited to IF statement constants. See prco_compiler/libprco/opt.c:33.                                                                                                                                                          |
| String variables                                 | Limited     | 2.10    | All strings are placed in low memory, not on stack. Limited to alphanumeric identities (easily fixable).                                                                                                                        |
| Dereferencing                                    | Yes         | 2.10    | Uses low precedence @ symbol. Unsafe like C's * dereferencing functionality. See prco_compiler/test/tests/control_for_3.pr                                                                                                      |
| Pointer arithmetic                               | Yes         | 2.10    | Inferred by dereferencing implementation.<br>E.g. @(a+1) returns contents of memory address at value a+1.                                                                                                                       |
| Assembler memory lay-<br>out                     | Limited     | 2.10    | First instructions (low memory) jump into <i>mid-dle</i> memory where main function is located.  Low memory consists of <i>global values</i> (like strings and global variables). High memory is reserved for stack management. |

# **Post-Project**

| 5.1 | Project Post-mortem          |    |  |  |  |  |  |
|-----|------------------------------|----|--|--|--|--|--|
|     | 5.1.1 Project Objectives     | 30 |  |  |  |  |  |
|     | 5.1.2 Development Process    | 30 |  |  |  |  |  |
|     | 5.1.3 Personal Contributions | 30 |  |  |  |  |  |
| 5.2 | Conclusion                   | 30 |  |  |  |  |  |

## 5.1 Project Post-mortem

- 5.1.1 Project Objectives
- **5.1.2 Development Process**
- 5.1.3 Personal Contributions

### 5.2 Conclusion

This project aimed at producing two complex technical systems: an embedded processor and a compiler. Both systems were developed and the output is an extremely valuable educational resource. The technologies created from this project spawning from the compiler include:

- an easily extendible recursive-descent text parser;
- an AST optimiser for constant-folding and unreachable code elimination;
- a machine code generator and assembler;
- and an emulator.

And from the embedded processor:

- a 16-bit instruction set and it's implementation;
- and a feed-forward pipeline architecture.

I believe these technologies and their implementation details should be shared as an open, educational resource for future projects and people interested in low-level code generation and embedded processor architecture.

# References

Arty Artix-7 FPGA Development Board (2015).

**URL:** https://uk.rs-online.com/web/p/programmable-logic-development-kits/1346478/

Bell, R. (2006). Introduction to iec 61508, pp. 3-12.

Bowen, J. and Stavridou, V. (1993). Safety-critical systems, formal methods and standards, <u>Software</u> Engineering Journal **8**(4): 189–209.

Crenshaw, J. W. (1988). Let's build a compiler!

MiniSpartan6+ (2014).

**URL:** https://www.scarabhardware.com/minispartan6/

Shanley, T. (2010). X86 Instruction Set Architecture, Mindshare Press.

# **Appendices**

| 7.1 | Appen | ndix A. User Guides                            | 32 |
|-----|-------|------------------------------------------------|----|
|     | 7.1.1 | PRCO304 Core Reference Guide                   | 32 |
|     | 7.1.2 | PRCO304 Compiler Reference Guide               | 33 |
|     | 7.1.3 | PRCO304 Emulator Reference Guide               | 33 |
|     | 7.1.4 | PRCO304 Processor Instruction Set Architecture | 37 |
| 7.2 | Appen | ndix B. Project Management Artefacts           | 41 |
|     | 7.2.1 | Project Initiation Document                    | 41 |
|     | 7.2.2 | Highlight Reports                              | 47 |
| 7.3 | Appen | ndix C. Other Documents                        | 57 |
|     | 7.3.1 | Compiler Functional Requirements               | 57 |
|     | 7.3.2 | Compiler Sequence Diagram                      | 58 |

## 7.1 Appendix A. User Guides

### 7.1.1 PRCO304 Core Reference Guide

### Instantiating the core in your FPGA design

The PRCO304 processor core can be instantiated in your FPGA design with the following code snippet:

```
// Instantiate a processor core
prco_core inst_core (
    .i_clk(),
    .i_en(),
    .i_reset(),
    // Operating mode (HIGH=single-step)
    .i_mode(),
    // Single-step pulse
    .i_step(),
    // UART comms
    .i_rx(),
    .q_tx(),
    .q_tx_byte(),
    // Debug outputs
    .q_debug_instr_clk(),
    .q_debug()
);
```

### 7.1.2 PRCO304 Compiler Reference Guide

#### **Command Line Interface Arguments**

#### Name

cli - compile a program into executable machine code for the PRCO304 processor.

#### **Synopsis**

```
cli [OPTION]... -i{FILE}
```

#### **Description**

- -d Dump output machine code to a file
- -D{bits} Select debug printing level. Example of use: -D0xFF to enable all debug bits.
- -i{file} Pass the input file to the compiler. Example of use: -i code.prco.
- $-0\{0-1\}$  Enable optimisation levels. 0 = no optimisations, >0 = constant folding and unreachable code elimination.
- -m{arch} Pass the target architecture to the compiler. Deprecated.

#### 7.1.3 PRCO304 Emulator Reference Guide

#### Name

emu - Disassemble and emulate PRCO304 processor programs.

#### **Synopsis**

```
emu [OPTION]... -i{FILE}
```

### Description

- -i Input machine code file. 1 instruction word per line. CRLF/LF accepted.
- -D{bits} Select debug printing level. Example of use: -D0xFF to enable all debug bits.

## **Example Output**

| Disassembly |       |               | of Input | 5:   |        |   |        |
|-------------|-------|---------------|----------|------|--------|---|--------|
|             | ADDI  | <b>\$-1</b> , | Sp       | 4fff |        | 0 | (null) |
|             | SW    | Вр,           | +0(Sp)   | 16e0 |        | 0 | (null) |
|             | VOM   | Вр,           | Sp       | 1ee0 |        | 0 | (null) |
|             | SUBI  | <b>\$+1</b> , | Sp       | 5f01 |        | 0 | (null) |
|             | IVOM  | \$62,         | Ax       | 2062 |        | 0 | (null) |
|             | SW    | Ax,           | -1(Bp)   | 10df |        | 0 | (null) |
|             | LW    | Ax,           | -1(Bp)   | 08df |        | 0 | (null) |
|             | WRITE | Ax,           | UART1    | 9800 | (null) |   |        |
|             | IVOM  | \$65,         | Ax       | 2065 |        | 0 | (null) |
|             | SW    | Ax,           | -1(Bp)   | 10df |        | 0 | (null) |
|             | LW    | Ax,           | -1(Bp)   | 08df |        | 0 | (null) |
|             | WRITE | Ax,           | UART1    | 9800 | (null) |   |        |
|             | IVOM  | \$6e,         | Ax       | 206e |        | 0 | (null) |
|             | SW    | Ax,           | -1(Bp)   | 10df |        | 0 | (null) |
|             | LW    | Ax,           | -1(Bp)   | 08df |        | 0 | (null) |
|             | WRITE | Ax,           | UART1    | 9800 | (null) |   |        |
|             | IVOM  | \$20,         | Ax       | 2020 |        | 0 | (null) |
|             | SW    | Ax,           | -1(Bp)   | 10df |        | 0 | (null) |
|             | LW    | Ax,           | -1(Bp)   | 08df |        | 0 | (null) |
|             | WRITE | Ax,           | UART1    | 9800 | (null) |   |        |
|             | VOM   | Sp,           | Вр       | 1fc0 |        | 0 | (null) |
|             |       |               |          |      |        |   |        |

| LW<br>ADDI<br>HALT | Вр,<br>\$+1, | +0(S]<br>Sp   | p) 0ee<br>4f0<br>900  | 1            | 0<br>0<br>0  | (nul                | 1)           |              |               |           |            |            |            |           |
|--------------------|--------------|---------------|-----------------------|--------------|--------------|---------------------|--------------|--------------|---------------|-----------|------------|------------|------------|-----------|
| Initi<br>00        | al Mer       | nory la       | ayout:<br>03          | 04           | 05           | 06                  | 07           | 08           |               |           |            |            |            |           |
| 4fff<br>9800       | 16e0<br>2020 | 1ee0<br>10df  | ======<br>5f01<br>8df | 2062<br>9800 | 10df<br>1fc0 | =====<br>8df<br>ee0 | 9800<br>4f01 | 2065<br>9000 | 10df<br>00    | 8df<br>00 | 9800<br>00 | 206e<br>00 | 10df<br>00 | 8df<br>00 |
| 00                 | 00           | 00            | 00                    | 00           | 00           | 00                  | 00           | 00           | 00            | 00        | 00         | 00         | 00         | 00        |
| 00                 | 00           | 00            | 00                    | 00           | 00           | 00                  | 00           | 00           | 00            | 00        | 00         | 00         | 00         | 00        |
| 00                 | 00           | 00            | 00                    | 00           | 00           | 00                  | 00           | 00           | 00            | 00        | 00         | 00         | 00         | 00        |
| 00                 | 00           | 00            | 00                    | 00           | 00           | 00                  | 00           | 00           | 00            | 00        | 00         | 00         | 00         | 00        |
| 00                 | 00           | 00            | 00                    | 00           | 00           | 00                  | 00           | 00           | 00            | 00        | 00         | 00         | 00         | 00        |
| 00                 | 00           | 00            | 00                    | 00           | 00           | 00                  | 00           | 00           | 00            | 00        | 00         | 00         | 00         | 00        |
| 00                 | 00           | 00            | 00                    | 00           | 00           | 00                  | 00           | 00           | 00            | 00        | 00         | 00         | 00         | 00        |
| 00                 | 00           | 00            | 00                    | 00           | 00           | 00                  | 00           | 00           | 00            | 00        | 00         | 00         | 00         | 00        |
| 00                 | 00           | 00            | 00                    | 00           | 00           | 00                  | 00           | 00           | 00            | 00        | 00         | 00         | 00         | 00        |
| 00                 | 00           | 00            | 00                    | 00           | 00           | 00                  | 00           | 00           | 00            | 00        | 00         | 00         | 00         | 00        |
| 00                 | 00           | 00            | 00                    | 00           | 00           | 00                  | 00           | 00           | 00            | 00        | 00         | 00         | 00         | 00        |
| 00                 | 00           | 00            | 00                    | 00           | 00           | 00                  | 00           | 00           | 00            | 00        | 00         | 00         | 00         | 00        |
| 00<br>00           | 00<br>00     | 00<br>00      | 00<br>00              | 00           | 00<br>00     | 00                  | 00<br>00     | 00<br>00     | 00<br>00      | 00        | 00         | 00         | 00         | 00        |
| 00                 | 00           | 00            | 00                    | 00<br>00     | 00           | 00<br>00            | 00           | 00           | 00            | 00<br>00  | 00<br>00   | 00<br>00   | 00<br>00   | 00<br>00  |
| 00                 | 00           | 00            | 00                    | 00           | 00           | 00                  | 00           | 00           | 00            | 00        | 00         | 00         | 00         | 00        |
| Execu              | ited Ir      | nstruc        | tions:                |              |              |                     |              | 00           | 00 00         | 00 00     | 00 00      | ff         |            |           |
| 0x00               | ADDI         | \$-1,         | Sp                    | 4fff         |              | 0                   | (null        | )            | 00 00         |           |            |            |            |           |
| 0x01               | SW           | Вр,           | +0(Sp)                | 16e0         |              | 0                   | (null        | )            | \$00, m       |           |            |            |            |           |
| 0x02               | VOM          | Вр,           | Sp                    | 1ee0         |              | 0                   | (null        |              | 00 00         | 00 00     | 00 fe      | fe         |            |           |
| 0x03               | SUBI         | <b>\$+1</b> , | Sp                    | 5f01         |              | 0                   | (null        | )            | 00 00         |           |            |            |            |           |
| 0x04               | IVOM         | \$62,         | Ax                    | 2062         |              | 0                   | (null        | )            | 00 00         |           |            |            |            |           |
| 0x05               | SW           | Ax,           | -1(Bp)                | 10df         |              | 0                   | (null        | )            | \$62, m       |           |            |            |            |           |
| 0x06               | LW           | Ax,           | -1(Bp)                | 08df         |              | 0                   | (null        | )            | mem[00        |           |            |            |            |           |
| 0x07               | WRITE        | Ax,           | UART1                 | 9800         | (nul         | 1)                  |              | 62           | 00 00<br>RT 0 |           |            | fd         |            |           |
| 0x08               | MOVI         | \$65,         | Ax                    | 2065         |              | 0                   | (null        |              | RT <- '       | b' 0x6    | 52         |            |            |           |
| 0x09               | SW           | Ax,           | -1(Bp)                | 10df         |              | 0                   | (null        | )            | 00 00         |           |            | fd         |            |           |
| 0x0a               | LW           | Ax,           | -1(Bp)                | 08df         |              | 0                   | (null        | )            | \$65, m       |           |            |            |            |           |
|                    |              |               |                       |              |              |                     |              |              | mem[OC        |           |            |            |            |           |
| 0x0b               | WRITE        | Ax,           | UART1                 | 9800         | (nul         | 1)                  |              | 65           | 00 00         | 00 00     | 00 fe      | fd         |            |           |

|          |           |                   |          |          |           | DODE A                     |
|----------|-----------|-------------------|----------|----------|-----------|----------------------------|
|          |           |                   |          |          |           | PORT 0<br>UART <- 'e' 0x65 |
| 0x0c M   | OVI \$6e, | Ax                | 206e     |          | 0 (null)  |                            |
|          |           |                   |          |          |           | 6e 00 00 00 00 00 fe fd    |
| 0x0d     | SW Ax,    | -1(Bp)            | 10df     |          | 0 (null)  | SN 460 mom[00]             |
| 0x0e     | LW Ax.    | -1(Bp)            | 08df     |          | 0 (null)  | SW \$6e, mem[00]           |
|          | <b>,</b>  | - \- <b>\-</b> \- |          |          | , (,      | LW mem[00], \$6e           |
|          |           |                   |          | >        |           | 6e 00 00 00 00 00 fe fd    |
| OxOf W   | RITE Ax,  | UART1             | 9800     | (null)   |           | PORT O                     |
|          |           |                   |          |          |           | UART <- 'n' 0x6e           |
| 0x10 M   | OVI \$20, | Ax                | 2020     |          | 0 (null)  |                            |
| 0.44     | CTT A     | 4 (D.)            | 40.16    |          | 0 ( 33)   | 20 00 00 00 00 00 fe fd    |
| 0x11     | SW Ax,    | -1(Bp)            | 10df     |          | 0 (null)  | SW \$20, mem[00]           |
| 0x12     | LW Ax,    | -1(Bp)            | 08df     |          | 0 (null)  | 5" \$20, mom[00]           |
|          |           | _                 |          |          |           | LW mem[00], \$20           |
| 0.40.11  | D.T       | IIAD TTA          | 0000     | ( 11)    |           | 20 00 00 00 00 00 fe fd    |
| 0x13 W.  | RITE Ax,  | UARII             | 9800     | (null)   |           | PORT O                     |
|          |           |                   |          |          |           | UART <- ' ' 0x20           |
| 0x14 M   | OV Sp,    | Вр                | 1fc0     |          | 0 (null)  |                            |
| O1 E     | IU Do     | 10(0~)            | 0ee0     |          | 0 (null)  | 20 00 00 00 00 00 fe fe    |
| 0x15     | LW Bp,    | +0(Sp)            | veeu     |          | 0 (null)  | LW mem[fe], \$00           |
|          |           |                   |          |          |           | 20 00 00 00 00 00 fe       |
| 0x16 A   | DDI \$+1, | Sp                | 4f01     |          | 0 (null)  |                            |
| 0x17 H   | ΔΙ.Τ      |                   | 9000     |          | 0 (null)  | 20 00 00 00 00 00 00 ff    |
| 01121 11 |           |                   |          |          | 0 (11411) |                            |
|          | mory cont |                   |          |          |           |                            |
| 00       | 01<br>    | 02<br>            | 03<br>   | 04       | 05 (      | 06 07 08                   |
| 4fff     | 16e0      | 1ee0              | <br>5f01 | 2062     | 10df 8    | 3df 9800 2065              |
| 10df     | 8df       | 9800              | 206e     | 10df     | 8df 9     | 9800 2020 10df             |
| 8df      | 9800      | 1fc0              | ee0      | 4f01     |           | 00 00 00                   |
| 00<br>00 | 00<br>00  | 00<br>00          | 00<br>00 | 00<br>00 |           | 00 00 00                   |
| 00       | 00        | 00                | 00       | 00       |           | 00 00 00                   |
| 00       | 00        | 00                | 00       | 00       | 00        | 00 00 00                   |
| 00       | 00        | 00                | 00       | 00       |           | 00 00 00                   |
| 00<br>00 | 00<br>00  | 00<br>00          | 00<br>00 | 00<br>00 |           | 00 00 00                   |
| 00       | 00        | 00                | 00       | 00       |           | 00 00 00                   |
| 00       | 00        | 00                | 00       | 00       |           | 00 00 00                   |
| 00       | 00        | 00                | 00       | 00       |           | 00 00 00                   |
| 00<br>00 | 00<br>00  | 00<br>00          | 00<br>00 | 00<br>00 |           | 00 00 00                   |
| 00       | 00        | 00                | 00       | 00       |           | 00 00 00                   |
| 00       | 00        | 00                | 00       | 00       |           | 00 00 00                   |
| 00       | 00        | 00                | 00       | 00       |           | 00 00 00                   |
| 00       | 00        | 00                | 00       | 00       | 00        | 00 00 00                   |

| 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
|----|----|----|----|----|----|----|----|----|
| 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 | 00 |
| 00 | 20 | 00 |    |    |    |    |    |    |

Final Registers:
20 00 00 00 00 00 00 ff

UART tx buf:
ben

#### 7.1.4 PRCO304 Processor Instruction Set Architecture

# **NOP**

**Description** The NOP instruction performs no action for 1 instruction cycle (see section ??).

**Assembly NOP** 

**Pseudocode** 

**Registers altered** 

Clock cycles 2 (FETCH, DECODE)

| 15:11 | 10:0 |
|-------|------|
| 00000 | X    |

## **LW - Load Word**

**Description** Copies a 16-bit word from RAM to a register.

**Assembly** LW Rd, +4(Ra)

**Pseudocode** Rd <= RAM[Ra + Simm5]

Registers altered Rd

Clock cycles 6 (FETCH, DECODE, READ, EXECUTE, RAM, WRITE)

| 15:11 | 10:8 | 7:5 | 4:0   |
|-------|------|-----|-------|
| 00001 | Rd   | Ra  | Simm5 |

## **SW - Store Word**

**Description** Copies a 16-bit from a register to RAM.

Assembly SW Rd, +4(Ra)

Pseudocode RAM[Ra+Simm5] <= Rd

Registers altered None

Clock cycles 6 (FETCH, DECODE, READ, EXECUTE, RAM, WRITE)

| 15:11 | 10:8 | 7:5 | 4:0   |
|-------|------|-----|-------|
| 00001 | Rd   | Ra  | Simm5 |

#### **MOVR**

**Description** The MOVR instruction copies a 16-bit register value to another register.

Assembly MOVR %Ra, %Rd

Pseudocode  $Rd \le Ra$ 

Registers altered Rd

Clock cycles 5 (FETCH, DECODE, READ, EXECUTE, WRITE)

| 15:11 | 10:8 | 7:5 | 4:0 |
|-------|------|-----|-----|
| 00011 | Rd   | Ra  | Х   |

#### MOVI

**Description** The MOVR instruction copies a 16-bit register value to another register.

Assembly MOVR %Ra, %Rd

Pseudocode Rd <= Ra

Registers altered Rd

Clock cycles 5 (FETCH, DECODE, READ, EXECUTE, WRITE)

| 15:11 | 10:8 | 7:0  |
|-------|------|------|
| 00100 | Rd   | Imm8 |

### **ADD**

**Description** The ADD instruction adds an immediate value to a destination register, Rd.

Assembly ADDI \$255, %Rd

 $\textbf{Pseudocode} \ \ \mathsf{Rd} <= \mathsf{Rd} + \mathsf{Imm8}$ 

Registers altered Rd

Clock cycles 5 (FETCH, DECODE, READ, EXEC, WRITE)

| 15:11 | 10:8 | 7:5 | 4:0 |
|-------|------|-----|-----|
| 01000 | Rd   | Ra  | X   |

#### **ADDI**

**Description** The ADD instruction adds an immediate value to a destination register, Rd.

Assembly ADDI \$255, %Rd

 $\textbf{Pseudocode} \ \ \mathsf{Rd} \mathrel{<=} \mathsf{Rd} \mathrel{+} \mathsf{Imm8}$ 

Registers altered Rd

Clock cycles 5 (FETCH, DECODE, READ, EXEC, WRITE)

| 15:11 | 10:8 | 7:0  |
|-------|------|------|
| 01001 | Rd   | Imm8 |

#### **SUBI**

**Description** The SUB instruction subtracts an immediate value from a destination register, Rd.

Assembly SUBI \$255, %Rd

 $\textbf{Pseudocode} \ \ \mathsf{Rd} \mathrel{<=} \mathsf{Rd} \mathrel{-} \mathsf{Imm8}$ 

Registers altered Rd

Clock cycles 5 (FETCH, DECODE, READ, EXEC, WRITE)

| 15:11 | 10:8 | 7:0  |
|-------|------|------|
| 01001 | Rd   | Imm8 |

#### **CMP**

**Description** Sets register, Rd, to the value of Ra - Rb.

Assembly CMP Rd, Ra, Rb

**Pseudocode**  $Rd \le CMP(Ra, Rb)$ 

Registers altered Rd

Clock cycles 5 (FETCH, DECODE, READ, EXEC, WRITE)

**Note** Rd should be set to SR (??) as the instruction operates on the SR register.

| 15:12 | 11:9 | 8:6 | 5:3 | 2:0 |
|-------|------|-----|-----|-----|
| 0003  | Rd   | Ra  | Rb  | X   |

#### **JMP**

**Description** Jumps the ?? if the condition is met within the ?? register.

Assembly JMP Rd, Imm8

**Pseudocode** ?? <= Rd if (?? & Imm8).

Registers altered None

Clock cycles 5 (FETCH, DECODE, READ, EXEC, BRANCH)

| 15:11 | 10:8 | 7:0  |
|-------|------|------|
| 01100 | Rd   | Imm8 |

An 8 bit immediate (7-0) can be set in the JMP instruction to create conditional jumps.

Table 7.1: Conditional jump immediate bits

|     | 15-11 | 10-8 | 7-0       | Semantics                  | Status Register |
|-----|-------|------|-----------|----------------------------|-----------------|
| JMP | 01100 | Rd   | 0000 0000 | Unconditional Jump         | Any             |
| JE  | 01100 | Rd   | 0000 0001 | Jump Equal                 | ZF=1            |
| JNE | 01100 | Rd   | 0000 0010 | Jump Not Equal             | ZF=0            |
| JG  | 01100 | Rd   | 0000 0011 | Jump Greater Than          | ZF=0 and SF=OF  |
| JGE | 01100 | Rd   | 0000 0100 | Jump Greater Than or Equal | SF=OF           |
| JL  | 01100 | Rd   | 0000 0101 | Jump Less Than             | SF<>OF          |
| JLE | 01100 | Rd   | 0000 0110 | Jump Less Than or Equal    | ZF=1 or SF<>OF  |
| JS  | 01100 | Rd   | 0000 0111 | Jump Signed                | SF=1            |
| JNS | 01100 | Rd   | 0000 1000 | Jump Not Signed            | SF=0            |

# 7.2 Appendix B. Project Management Artefacts

# 7.2.1 Project Initiation Document

#### Introduction

Field-Programmable Gate Array (FPGA) devices are an incredibly powerful and versatile solution to many electronics applications including digital signal processing and high-speed test and measurement tools. I will use this project opportunity to learn more about FPGA development and CPU architecture and apply knowledge learnt to create a solution to the need of a side-microprocessor in many FPGA-based applications.

Modern computing and electronics equipment, like function generators, oscilloscopes, and spectrum analysers, use FPGAs to implement their compute intensive logic. These FPGAs are often accompanied by a small, low-cost, microprocessor to supervise and provide interfaces to external peripherals.

The aim of this project is to implement this side-microprocessor into the FPGA to save on BOM costs, PCB space, and power costs, which contribute to higher development and product costs. While savings can be made by the lack of side microprocessor, the product may need a larger FPGA to accommodate the embedded microprocessor. The project will produce a small, soft-core, CPU design and compiler.

Although there is no direct client in this project, I believe this project will produce an attractive product for FPGA-based product designers wishing to employ an embedded processor solution.

### **Business Case**

I will target my interest in FPGA development and apply my learning of such in tackling the issues resulting from the use of a side-microprocessor in FPGA based applications.

The requirement of a side-microprocessor to control and provide external interfaces to FPGA-based applications carries a significant demand in both development and projects costs. Firstly, the inclusion of a external microprocessor in a project design will require more PCB space and design considerations, adding to the development time and costs of the project. The external microprocessor may also require a licensed compiler to compile and load the code onto the microprocessor, adding to the cost of the project. In addition, the microprocessor's on-chip memory may not be large enough to store the compiled code and an external flash memory chip may also be required.

Moving to an integrated microprocessor on the FPGA brings many significant advantages: reduction of required PCB space and development time, lower BOM (bill of materials) cost, and better in-field updating.

Releasing updates to embedded projects is a challenging problem. With the integrated solution, FPGA bitstreams and the soft-microprocessor code can be bundled together, making it much easier to update products in the field without sending an engineer to the location or providing complicated instructions which require specific equipment (e.g. in-circuit debuggers).

# **Project Objectives**

The outcome of the project will be to design a small, portable, FPGA-based, CPU core that electronic Product Designers can choose as an alternative to a physical side-microprocessor to embed into their product.

#### **Core Deliverables**

- 1. (Core deliverable) To improve my knowledge and experience of FPGA development, CPU architecture, and low-level programming.
- 2. (Core deliverable) To build a working and operational soft-core CPU core capable of performing simple tasks.
- 3. (Core deliverable) To provide product designers with an affordable alternative to a side-microprocessor in their FPGA-based products.
- 4. (Core deliverable) To provide a technical documentation and specification for the embedded core.
- 5. (Sub deliverable) To provide embedded products a convenient solution to in-field updating.
- 6. (Sub deliverable) To provide easy interfacing between the FPGA design and the embedded core.

# **Initial Scope**

#### **Core Deliverables**

These deliverables are the base requirement for the project to be released in a functional and worthwhile state.

- 1. (Core deliverable) A small, portable, instantiate-able, FPGA-based CPU core.
- 2. (Core deliverable) A C-like programming interface. A compiler taking input of a C-like grammar and outputting executable machine code for the embedded core. The machine code can be embedded into the FPGA bitstream and loaded onto the FPGA to run. Time estimate: 1 month.
- 3. (Core deliverable) A 16-bit RISC instruction set architecture (ISA). The core (1) will decode and execute instructions encoded in this format. The compiler (2) will output machine code in this format. The ISA will support: fixed length instructions; 12-bit immediate values; primitive arithmetic instructions (ADD, SUB, MUL, etc.); GPIO read and write instructions; RAM stack operators (PUSH, POP). A custom ISA will be designed and implemented (see subsection 7.2.1).

#### **Extended Deliverables**

These deliverables may not be achievable in the time frame specific in subsection 7.2.1. These deliverables may require extra time to develop, require more experience and skill to develop, or require resources currently unattainable.

- 1. GCC/LLVM/8CC compiler backend for C programming.
- 2. Wishbone interface for easier modularity and inter-module communication.
- 3. Multi-core design with Wishbone (2).
- 4. Single-step debugging interface (with JTAG?).

- 5. Configurable build options (register/bus widths, optimisations/pipelining, user/privileged user mode to support modern operating systems).
- 6. Memory management modules to provide protected and virtual memory lookup tables.

# **Resources and Dependencies**

For the first half of the development cycle, the core can be developed and verified using the Verilog simulator and test suite, Verilator, and VHDL and Verilog simulator, iSim.

The second half of development will require deploying and debugging on real hardware. This will require an FPGA development kit. To better emulate customer products, the development kit should feature common components such as LEDs, GPIO, USB interface, flash-based storage and memory, and optionally an analogue audio output port.

The low-middle range of FPGA devices I am targeting is the popular and affordable yet feature rich Spartan-6 and Artix-7 FPGAs. From my placement, I have gained experience in Xilinx FPGAs and so will be targeting them for this project to reduce risk and development time.

The following FPGA development kits are suitable for this project:

- 1. MiniSpartan6+ Scarab Hardware \$79 (already owned) (MiniSpartan6+, 2014). The MiniSpartan6+ features a Spartan-6 XC6SLX9 FPGA, 8 LEDs, 2 digital and analogue headers, FT2232 FTDI USB to JTAG, 64Mb SPI flash memory, 32MB SDRAM, an audio output jack, and a MicroSD socket.
- 2. Arty Artix-7 FPGA Development Board Digilent \$100 (Arty Artix-7 FPGA Development Board, 2015). The Arty development board features a larger Artix-35T FPGA with over 20x the number of logic cells and block memory compared to the LX9 in the MiniSpartan6+. The board components include 256MB DDR3 RAM, 16MBx4 SPI flash memory, USB-JTAG, 8 LEDs (4 of which are RGB), 4 switches, 4 buttons, and multiple Pmod connectors.

The greater number of IO options and larger FPGA make the Arty board better suited to emulating real customer products.

The project will require a computer or laptop to develop the core and compiler on and continuous integration systems to perform testing on the incremental builds. For the project demo, an oscilloscope (already owned) or digital logic analyser may be required to demonstrate some of the core's features.

# **Method of Approach**

Development of the core and compiler will be done in separate stages of the project (see subsection 7.2.1). The two deliverables will be split into 2 sub-projects. Both sub-projects will employ the Agile development process, using Agile's sprints to split up tasks into sub-tasks and Agile's scrums to discuss progress, features, and changes.

Technologies used will be:

- 1. Verilog A hardware description language used to code the internal FPGA design.
- 2. C A low-level programming language to develop the compiler and assembler.
- 3. Verilator A C++ Verilog simulator and unit testing framework for verifying the FPGA design. Unit tests will be written for each component of the core: register set, decoder, arithmetic logic unit (ALU), and IO. This will aid the sprint approach by ensuring that requirements implied by the unit tests do not break over development iterations.
- 4. iSim A Verilog and VHDL Simulator. This will be used to visualize the timings of internal signals within the FPGA components such as the decoder and ALU.

# **Initial Project Plan**

#### Project time line breakdown

The project will be split into 4 parts:

- 1. Project information gathering and requirement generation.
- 2. Active development sprints.
- 3. Test and verification.
- 4. Final report and clean up.

The following table breaks down the 4 parts into sub-tasks and provides their descriptions and estimated start and end times.

**Table 7.2:** Initial Project Plan time breakdown \*Expected time.
Shaded stages are time varying periods for bug fixing.

| Stage                                    | Start Date* | End Date* | Project Deliverables                                                                         |
|------------------------------------------|-------------|-----------|----------------------------------------------------------------------------------------------|
| 1.0. Project Initiation                  |             | 02 Feb    | Process Initiation Document                                                                  |
| 1.1. Research and requirement gathering  | 02 Feb      | 09 Feb    | Existing soft-core processor designs, constraints, features, implementation.                 |
| 1.2. Core high level design              | 10 Feb      | 17 Feb    | Soft-core CPU architecture; Register definitions; Bus widths; Initial ISA instruction table. |
| 2.1. Core development sprints            | 18 Feb      | 10 Mar    | Iterative soft-core development sprints                                                      |
| 2.1.1. Core testing and verification     | 11 Mar      | 15 Mar    | Any tasks required to meet design constraints.                                               |
| 2.2. Compiler development sprints        | 15 Mar      | 31 Mar    | Iterative compiler development sprints                                                       |
| 2.2.1. Compiler testing and verification | 10 Apr      | 14 Apr    | Any tasks required for compiler to produce correct code generation.                          |
| 3.1. Real hardware deployment            | 15 Apr      | 19 Apr    | Deployment of Verilog code to a real FPGA device.                                            |
| 3.2. Final verification                  | 20 Apr      | 24 Apr    | Verification for FPGA design and compiler.                                                   |
| 4.1. Complete final report               | 25 Apr      | 4 May     | PRCO304 Final Report.                                                                        |

#### **Control Plan**

Management of the project will be done using the PRINCE2 technique.

The project initiation document (this) describes high-level requirements, objectives, and business cases.

Weekly highlight reports and meetings will be held to ensure task proficiency and to identify any challenges that need attention.

Project risks and challenges are identified in subsection 7.2.1 along with proposed solutions for their occurrence.

#### **Initial Risk Assessment**

The following subsection outlines potential projects risks their suitable management strategy.

#### 1. Real hardware synthesis.

A challenge involved in the development of FPGA, CPLD, and other programmable logic devices, is the realization of the HDL code on real hardware. This can result in different behaviour of the real implementation to the simulated design - a major (and expensive) problem. This issue is caused by not meeting physical constraints required by the FPGA. These include timing, space, and power constraints.

To help reduce this issue, I will utilise the ISE Design Suite's constraint validator tool. Before deploying to real hardware, the design must meet the constraints I declare that enable it to run correctly on real hardware. I can use these constraints to identify how much space, time, and power, I have left to implement features.

# 2. HDL programming.

HDL (Hardware Description Language) is a text based language used to describe hardware components and their inter-connections. Verilog, a HDL language closer to C than VHDL, is what my FPGA core will be programmed in. This language is taught very little of in the Computer Science course and will require external learning resources so I can use it effectively.

My placement, telecommunications signal generator company, Spirent Communications, heavily utilise FPGA devices in their products, in which I gained valuable knowledge on the FPGA development life cycle and deployment. To improve my knowledge of the tools required (ISE Design Suite) gained from my placement experience, I shall learn from HDL programming books such as HDL Programming Fundamentals: VHDL and Verilog (?).

#### 3. Compiler development time.

A compiler will be required to provide an easy method of running user code on the FPGA core. The compiler is a lesser deliverable but will take considerable to time implement.

If time is short, the compiler may only convert and assemble an assembly-like language with simple features (goto statements, stack management i.e. stack frames). If time is available, a better grammar can be developed with common language features such as if statements, scope blocks, and variables.

The possibility also exists of using an existing compiler, such as GCC, LLVM, or 8CC, and creating a custom back-end for the FPGA core's architecture. My already brief experience with these compilers with their poor documentation means it may be quicker to build a compiler from scratch than create a custom back-end. A short period of time will be a given to allow exploration of compilers as it may allow using more language features (ANSI C) instead of a small subset. This will allow for a more complex demo of the FPGA core.

#### 4. Schedule overrun.

This is a complex project will multiple sub-projects (core & compiler). Ensuring the large number of features will require a tight development schedule which is prone to over-running.

I can identify and account for this by having weekly progress updates that will be scheduled with the project supervisor outlying feature progress and challenges. If the schedule slips largely due to an unforeseen problem or unreasonable requirement, this shall be brought up in the following meeting and a solution will be agreed upon, be it modifying deliverable or allowing extra time for the feature.

#### 5. Technology failure.

To overcome the risks of data loss all code and resources will be stored in local and remote Git repositories. In the event of the FPGA development kit failing, be it a component on the board

or the FPGA itself, either: (a) a demo of the FPGA core not showing features of the failed component; or (b) a simulated design that meets constraints imposed by the physical FPGA will be provided and demonstrated in a simulator.

# **Quality Plan**

The following quality strategies will be employed to achieve a successful project and product.

Table 7.3: Initial Quality Plan.

| Quality Check (QC)               | Strategy                                                                                                                                                                                                                                                                  |  |
|----------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| QC1. Requirement reality         | Requirements will be checked during the weekly highlight reports to verify that when requirements begin to be implemented they are realistic and achievable within the time frame specified in subsection 7.2.1.                                                          |  |
| QC2. Soft-core design validation | While continuous testing and verification will be performed on the core (unit test, FPGA constraints reached), a variable period of time (stage 2.1.1) will be allocated after the development sprints to fix bugs and unexpected behaviour, and polish the final design. |  |
| QC3. Compiler validation         | As with QC2, unit tests and continuous integration tests wind be performed for each code change to validate that change do not produce bad code generation. A time varying period (stage 2.2.1) is also allocated to fix and polish the compiler.                         |  |
| QC4. Real hardware performance   | Electronic test equipment, such oscilloscopes and digital logic analysers, will be used to verify the correct behaviour of the code on real hardware. Initial risk (1) states that there is a risk of the FPGA deployed core will behave differently to the simulation.   |  |

# Legal, Social, and Ethical Considerations

Legal considerations need to be taken into account due to already existing commercial soft-processor designs. Existing soft-processor designs include the ARM family of soft-cores (?) and Xilinx's MicroBlaze soft-core (?). Emulating another soft-core processor's architecture may result in legal challenges even if I do not distribute the final product. As this is a learning project, instead of emulating another architecture, I will design my own architecture from the ground up to learn first-hand the design considerations, implementation, and verification of CPU designs.

Social and ethical considerations are not applicable for this project.

# 7.2.2 Highlight Reports

PRCO304: Highlight Report 1

Name: Ben Lancaster

Date: 06/02/2018

Active project stage: Stage 1.1: Research and Requirement Gathering

#### Review of work undertaken:

This week was assigned to work on stage 1.1: Research and requirement gathering.

# Research and requirement gathering:

Research into existing soft-core processor designs has been started to identify their features, targets, and advantages and disadvantages. Key existing soft-core processors found are:

- Xilinx' MicroBlaze: a 32-bit Xilinx FPGA embeddable core capable of running operating systems, like Linux. Exposes a configurable GUI to customise the build of the processor to suit designers requirements (like number of GPIO, interrupts, timers, etc.).
- ARM Cortex-A9: a 32-bit Xilinx and Altera FPGA core. Features out-of-order execution, compatible with existing ARM Thumb2 C compilers, and multi-core processing.

I have used this research to aim my soft-core processor's requirements and architecture. To document and finalize my processors design and requirements, I have started a processor specification and reference document. This document outlines the processors features, architecture, compatibility, and instructions.

### Additional progress:

- Version control set up for documentation, highlight reports, and code bases.

#### **Risks and Challenges:**

Urgent risks:

New risks:

Existing risks:

RC4: Schedule overrun. A gantt time chart has been created to better visualize task durations and requirements.

#### Plan of work for the next week:

Work will begin on Stage 1.2: Core high level design.

Finalised specifications and architecture of the soft-core processor will be put into a processor specification and reference document.

Architecture, control, pipelines, will be visualised in this document.

#### Date(s) of supervisory meeting(s) since last Highlight:

This is the 1st highlight report.

30/01/18 - An introductory meeting was held to discuss the project initiation document (PID) and gain feedback on the project.

# Notes from supervisory meeting(s) held since last Highlight:

Ensure risks are carefully explored and project core deliverables are realistic and achievable.

PRCO304: Highlight Report 2

Name: Ben Lancaster

**Date:** 15/02/2018

Active project stage: Stage 1.2: Core high level design

#### Review of work undertaken:

This week was assigned to work on stage 1.2: Core high level design. gathering.

# Core high level design:

I have spent this week defining a processor specification and creating a processor specification/reference guide booklet (see attached). This booklet will contain both high-level and technical details regarding the design and implementation of the processor, including: register sets, control and pipelining strategies, the ISA and each instruction, and the compiler and how to use it.

This booklet will be developed over the life cycle of the project. Although the specification has been clearly defined, the booklet will be incrementally updated as processor features/requirements are added to the implementation (such as instructions, modules, and compiler features).

Currently the reference booklet contains: register set definitions, several primitive instructions, and a brief introduction to instruction cycle timing.

# Risks and Challenges:

Urgent risks:

New risks:

Existing risks:

RC4: Schedule overrun. A gantt time chart has been created to better visualize task durations and requirements.

Resolved risks:

RC4: Schedule overrun. A gantt time chart has been created to better visualize task durations and requirements. (See attached time management chart indev.)

## Plan of work for the next week:

Work will begin on Stage 2.0: Core dev. Register set implementation.

The register set module will be implemented in Verilog for the processor. Unit tests will be created to verify the timing/behaviour of the module.

The processor specification/reference booklet will be updated to describe how the register set has been implemented in the processor.

#### Date(s) of supervisory meeting(s) since last Highlight:

08/02/18 15:00 - 15:40

# Notes from supervisory meeting(s) held since last Highlight:

Discussion included comparing existing processor's (ARM, x86) features (privileged instructions, interrupts, IO, variable-length ISA) and designs (ISA and pipelining) to this processor.

PRCO304: Highlight Report 3

Name: Ben Lancaster

Date: 20/02/2018

**Active project stage:** Stage 2.0: Core Register-set Implementation.

#### Review of work undertaken:

This week was assigned to work on stage 2.0: Core Register-set Implementation.

# **Core Register-set Implementation:**

Good progress has been made implementing the PRCO processor's register set in Verilog. The register set consists of 8 16-bit wide general purpose registers labelled rA through rH in duel-port read and single-port write.

Implementation progress is approximately 1 week ahead of schedule. Because of this, work has also been done on the decoder and ALU modules.

Consideration of the control/sequencing pipeline has been considered. The pipeline needs to work for time-varying functions (such as memory writes). The current plan is to give each module outputs to signal when it has finished so the following module can safely read in data and operate on it. A handshake between modules currently seems overkill due to the relatively simple structure but may be considered later in the project.

# **Risks and Challenges:**

# Urgent risks:

New risks:

RC5: Complex memory operations (PUSH, POP) may require multiple instructions. PUSH/POP might be split into: (1) Inc/dec stack pointer; (2) Read RAM[stack pointer]. The compiler will be able to resolve this issue.

## Existing risks:

Resolved risks:

### Plan of work for the next week:

Work will begin on Stage 2.1: Core dev. Decoder implementation.

Some progress has already made but the decoder is not finished.

The processor specification/reference booklet will continued to be updated with implementation specific details of the processor.

#### Date(s) of supervisory meeting(s) since last Highlight:

13/02/18 09:40

## Notes from supervisory meeting(s) held since last Highlight:

This discussions was over email; it was decided that a physical meeting would not be beneficial as the current project stage was starting the *PRCO Processor Reference Guide* booklet. Progress on the booklet was shared and a brief overview of the Register-set and Decoder implementation.

# PRCO304: Highlight Report 4

Name: Ben Lancaster

Date: 28/02/2018

#### Active project stage:

Stage 2.1: Core: Register-set Implementation. Stage 2.2: Core: ALU, RAM Implementation.

#### Review of work undertaken:

#### Stage 2.1: Core: Decoder Implementation:

Simple instructions, ADD, ADDI, MOV, MOVI, SUB, SUBI, LW, SW, instructions can now be decoded. The decoder has been integrated into the pipeline and it can choose and set up appropriate dependencies for the instruction.

### Stage 2.2: Core: ALU, RAM Implementation:

ALU development has started. Some basic operations such as ADD, ADDI, SUB, SUBI, and pass-through ops such as MOV, MOVI, have been implemented. On-chip ram development will be starting this week.

### Core: Pipeline/control system

A significant development breakthrough for the control/pipeline system has been achieved. I'm calling it a feed-forward pipeline as the flow of control only moves in the forward direction and when the previous module has completed.

#### Compiler: Text parser development starting:

Work into a simple text parser has begun including file opening, reading character by character, and a parser stack.

#### **Risks and Challenges:**

Urgent risks:

New risks:

Existing risks:

RC5: Complex memory operations (PUSH, POP) may require multiple instructions. PUSH/POP might be split into: (1) Inc/dec stack pointer; (2) Read RAM[stack pointer]. The compiler will be able to resolve this issue.

Resolved risks:

#### Plan of work for the next week:

Work will continue for 1 more week on stage 2.1 and 2.2 as per the time plan.

The processor specification/reference booklet will continued to be updated with implementation specific details of the processor.

## Date(s) of supervisory meeting(s) since last Highlight:

21/02/18 13:00 - 13:4

## Notes from supervisory meeting(s) held since last Highlight:

Discussion included improving time management gantt chart by showing task dependencies; and potential final demo ideas (store ASCII string on SDcard/external memory and have processor loop over and print each character out over RS232.

PRCO304: Highlight Report 5

Name: Ben Lancaster

**Date:** 07/03/2018

#### Active project stage:

(ON-TIME) Stage 2.2: Core: ALU, RAM Implementation.

(EARLY) Stage 3.0: Compiler: Code-generation.

#### Review of work undertaken:

### (ON-TIME) Stage 2.2: Core: ALU, RAM Implementation:

CMP and JMP instructions have been implemented. The CMP instruction is the only 3 register instruction (Type 3) and required a bit of reworking to implement. The CMP instruction subtracts Ra from Rb and sets appropriate status bits (SR\_Z, SR\_O, SR\_E, SR\_O) into the Rd register. The JMP instruction also required a bit of reworking as it affects the Program Counter. It is passed an 8-bit immediate containing jump conditions (JMP\_EQ, JMP\_GE, JMP\_LT, etc.) and compares against the SR register specific in the CMP instruction.

### (EARLY) Stage 3.0: Compiler: Code-generation.

Work has started ahead-of-schedule on code-generation for the compiler. I have begun implementing functions to encode instructions into the ISA's machine-code format. In addition, the compiler will also print out human-readable assembly in AT&T format.

## **Real-hardware Implementation:**

I have also begun testing the implementation on the FPGA development board. Doing this early allows me to fix critical synthesis problems earlier, reducing risk for the project and demonstration. Figure 7.1 shows the FPGA core running on the FPGA development board.

# **Risks and Challenges:**

Urgent risks:

New risks:

Existing risks:

RC5: Complex memory operations (PUSH, POP) may require multiple instructions. PUSH/POP might be split into: (1) Inc/dec stack pointer; (2) Read RAM[stack pointer]. The compiler will be able to resolve this issue.

Resolved risks:

#### Plan of work for the next week:

Work will begin into the integration of a UART (RS232) communication protocol, allowing us to better demonstrate functionality of the processor and connect to other peripherals.

Work will also begin on implementing an instruction single step cycle button, allowing better demonstration of the core. Currently the demonstration only lasts approximately 800ns.

The processor specification/reference booklet will continued to be updated with implementation specific details of the processor.

#### Date(s) of supervisory meeting(s) since last Highlight:

01/03/18 (bi-weekly highlight meeting)

### Notes from supervisory meeting(s) held since last Highlight:

Biweekly meetings are held instead of weekly.

# PRCO304: Highlight Report 6

Name: Ben Lancaster

**Date:** 15/03/2018

# Active project stage:

(ON-TIME) Stage 2.3: Core: GPIO, Communication .

(EARLY) Stage 3.2: Compiler: Assembler.

#### Review of work undertaken:

Single-instruction stepping has been implementing allowing an external button to step and instruction (A key demo requirement!).

# (ON-TIME) Stage 2.3: Core: GPIO, Communication Implementation:

A UART module library has been included in the core along with a FIFO buffer. The UART works well with single-instruction stepping, but free running the buffer immediately fills up and output is in random order.

# (EARLY) Stage 3.2: Compiler: Assembler.

The assembler identifies instructions that require offsets and immediate to be calculated. The assembler can now modify instructions to fill in missing data.

#### **Risks and Challenges:**

**Urgent risks:** 

New risks:

RC6: UART FIFO fills up too quickly, resulting in bad output.

### Existing risks:

Resolved risks:

RC5: Complex memory operations (PUSH, POP) may require multiple instructions. Core will not support PUSH/POP as they are too complex. Compiler will output 2 instructions to emulate a PUSH/POP.

#### Plan of work for the next week:

Work will continue on parsing expressions in the compiler (if, for, while, etc.) and their codegen.

The processor specification/reference booklet will continued to be updated with implementation specific details of the processor.

The final report document content will be started (structure already laid out).

#### Date(s) of supervisory meeting(s) since last Highlight:

12/03/18 (bi-weekly highlight meeting)

# Notes from supervisory meeting(s) held since last Highlight:

RC6: Confirmation that PUSH/POP concepts will be split into 2 instructions due to limited complexity of the processor core.

## PRCO304: Highlight Report 7

Name: Ben Lancaster

Date: 21/03/2018

# Active project stage:

(EXTENDED) Stage 2.3: Core: GPIO, Communication .

(ON-TIME) Stage 3.2: Compiler: Assembler. (ON-TIME) Stage 3.3: Compiler: Verification.

#### Review of work undertaken:

HALT behaviour has been added.

# (ON-TIME) Stage 3.2: Compiler: Assembler.

Compiler can now produce code generation for function x86 style stack frames, where the stack pointer and base pointer are pushed/popped to the stack when entering/exiting a function. This is the foundation for code generating passed and local parameters. An example is shown in section 7.2.2.

#### (ON-TIME) Stage 3.3: Compiler: Verification.

For the first time, the compiler output has been run on the processor. Two simple programs were run: one to test addition, and the other to test calling functions (without parameters). After fixing some bugs around the JMP instruction behaviour on the processor, both programs were able to run successfully.

# Risks and Challenges:

Urgent risks:

New risks:

### Existing risks:

RC6: UART FIFO fills up too quickly, resulting in bad output.

Resolved risks:

RC5: Complex memory operations (PUSH, POP) may require multiple instructions. Core will not support PUSH/POP as they are too complex. Compiler will output 2 instructions to emulate a PUSH/POP.

#### Plan of work for the next week:

Compiler language control statements such as IF and FOR need to be parsed and codegen'd. This is a requirement for the demo (iterating over contiguous memory and printing to UART?).

The processor specification/reference booklet will continued to be updated with implementation specific details of the processor.

The final report document will continued to be updated.

## Date(s) of supervisory meeting(s) since last Highlight:

12/03/18 (bi-weekly highlight meeting)

## Notes from supervisory meeting(s) held since last Highlight:

RC6: Confirmation that PUSH/POP concepts will be split into 2 instructions due to limited complexity of the processor core.

# **Highlight Attachments**

# Highlight 5



(a) Oscilloscope measurement of the *q\_debug\_instr\_clk* signal running on the MiniSpartan6+ development board.



**(b)** Xilinx iSim simulation view of the *q\_debug\_instr\_clk* signal.

**Figure 7.1:** Initial real-hardware implementation on the MiniSpartan6+ (XC6SLX9-3FTG256) development board showing timing of the *q\_debug\_instr\_clk* signal. This signal is a 1 clock pulse indicating the start of an instruction cycle. In this example, instructions: MOVI \$10, %Ra; MOVI \$10, %Rb; and CMP %Rc, %Ra, %Rb followed by 6 NOP instructions, are used.

We can see that both implementations have a matching 660ns delay between instruction cycles for the same instructions, indicating that the real-hardware FPGA implementation is working correctly.

### Highlight 7

Compiler input file contents:

```
def foo() {
    10 + 1;
}

def main() {
    32;
foo();
}
```

Compiler output machine code disassembly (pre-optimisation, post assembling):

```
0x00
             ADDI
                      $-1,
                                        4fff
                                                  Function/sf entry
                               Sp
   0x01
                               +0(Sp)
             SW
                      Вp,
                                        16e0
                                                  (null)
2
            VOM
   0x02
                               Sp
                                        1ee0
                                                  main
                      Вp,
3
   0x03
            IVOM
                      $20,
                               Ax
                                        2020
                                                  NUMBER
4
   0x04
                                        2209
            IVOM
                      $9,
                               Cx
                                                  Create return address
   0x05
             ADDI
                      $-1,
                                        4fff
                                                  (null)
                               Sp
   0x06
            SW
                      Cx,
                               +0(Sp)
                                        12e0
                                                  PUSH
   0x07
            IVOM
                      $d,
                               Cx
                                        220d
                                                  call
8
   80x0
             JMP
                      Cx
                                        6200
                                                  JMP
                                                  Function/sf exit
   0x09
             VOM
                      Sp,
                               Вр
                                        1fc0
10
                               +0(Sp)
                                                  POP
   0x0A
             LW
                      Вр,
                                        0ee0
11
                                        4f01
   0x0B
             ADDI
                      $+1,
                               Sp
                                                  (null)
12
   0x0C
            HALT
                                        9000
                                                  MAIN HALT
13
14
   0x0D
             ADDI
                      $-1,
                               Sp
                                        4fff
                                                  Function/sf entry
15
                               +0(Sp)
   0x0E
             SW
                      Вp,
                                        16e0
                                                  (null)
16
   0x0F
                               Sp
                                        1ee0
                                                  foo
             VOM
                      Вр,
^{17}
                                        200a
   0x10
            IVOM
                      $a,
                               Ax
                                                  NUMBER
18
   0x11
             ADDI
                                        4fff
                      $-1,
                               Sp
                                                  (null)
19
   0x12
             SW
                      Ax,
                               +0(Sp)
                                        10e0
                                                  PUSH
20
   0x13
            IVOM
                      $1,
                               Ax
                                        2001
                                                  NUMBER
21
                               +0(Sp)
                                                 POP
   0x14
            LW
                      Cx,
                                        0ae0
22
                                        4f01
   0x15
             ADDI
                      $+1,
                               Sp
                                                  (null)
23
                                        4040
                                                  BIN ADD
   0x16
             ADD
                               Cx
                      Ax,
24
   0x17
            MOV
                               Вр
                                        1fc0
                                                  Function/sf exit
                      Sp,
25
                               +0(Sp)
                                                  POP
   0x18
            LW
                                        0ee0
                      Вp,
26
   0x19
             ADDI
                      $+1,
                               Sp
                                        4f01
                                                  (null)
27
   0x1A
             LW
                      Cx,
                               +0(Sp)
                                        0ae0
                                                  POP
28
             ADDI
                      $+1,
                                        4f01
                                                  (null)
   0x1B
                               Sp
29
   0x1C
             JMP
                      Cx
                                        6200
                                                  FUNC RETURN to CALL
```

# 7.3 Appendix C. Other Documents

# 7.3.1 Compiler Functional Requirements



Figure 7.2: PRCO304 compiler Functional requirements and their technical implementation requirements.

# 7.3.2 Compiler Sequence Diagram



Figure 7.3: UML sequence diagram for the PRCO304 compiler.