### Department of Electrical and Computer Engineering Queen's University

## **ELEC-374 Digital Systems Engineering Laboratory Project**

Winter 2020

## Designing a Simple RISC Computer

# 1. Objectives

The purpose of this project is to design, simulate, implement, and verify a Simple RISC Computer (Mini SRC) consisting of a simple RISC processor, memory, and I/O. You are to use the Altera Quartus II design software for this purpose. The system is to be implemented on the Altera Cyclone III chip (EP3C16F484) of the DEO evaluation board or on the Altera Cyclone V chip (5CEBA4F23C7) of the DE0-CV evaluation board.

The Mini SRC is similar to the SRC described in the Lab Reader, reproduced from the text by Heuring and Jordan. The Datapath, Control Unit, and Memory Interface for Mini SRC have the same relationship as shown in Figure 4.1 on page 142 of the Lab Reader for the SRC system. The processor design is similar to the information presented in Figures and Tables on pages 143 through 167 of the Lab Reader. As part of the learning process for the CPU design project, make sure you carefully read the Mini SRC CPU specification and the descriptions of different lab phases, consult the Lab Reader, refer to the tutorial on Intel Quartus II, ModelSim-Altera and DEO/DEO-CV evaluation boards, the tutorial on CPU Design Project, and read the Lecture Slides on Computer Arithmetic, VHDL, and Verilog.

### 2. Processor Specification

The Mini SRC is a 32-bit machine, having a 32-bit datapath and sixteen 32-bit registers R0 to R15, with R0 to R7 as general-purpose registers, R8 and R9 as the return value registers, R10 to R13 as the argument registers, R14 as the return address register (RA), holding the return address for a jal instruction, and R15 as the stack pointer (SP). It also has two dedicated 32-bit registers HI and LO for multiplication and division instructions. Note that Mini SRC does not have a condition code register. Rather, it allows any of the general-purpose registers to hold a value to be tested for conditional branching. The memory unit is 512 words.

The following is a formal definition of the Mini SRC.

#### **Processor State**

PC<31..0>: 32-bit Program Counter (PC) IR<31..0>: 32-bit Instruction Register (IR)

R[0..15]<31..0>: Sixteen 32-bit registers named R[0] through R[15]

R[0..7]<31..0>: Eight general-purpose registers R[8..9]<31..0>: Two Return Value Registers R[10..13]<31..0>: **Four Argument Registers** R[14]<31..0]: Return Address Register (RA)

R[15]<31..0>: Stack Pointer (SP)

32-bit HI Register dedicated to keep the high-order word of a Multiplication HI<31..0>:

product, or the Remainder of a Division operation

LO<31..0>: 32-bit LO Register dedicated to keep the low-order word of a Multiplication

product, or the Quotient of a Division operation

#### **Memory State**

Mem[0..511]<31..0>: 512 words (32 bits per word) of memory

MDR<31..0>: 32-bit memory data register MAR<31..0>: 32-bit memory address register

#### I/O State

In.Port<31..0>: 32-bit input port
Out.Port<31..0>: 32-bit output port
Run.Out: Run/halt indicator

Stop.In: Stop signal Reset.In: Reset signal

The Arithmetic Logic Unit (ALU) performs 12 operations: addition, subtraction, multiplication, division, shift right, shift left, rotate right, rotate left, logical AND, logical OR, Negate (2's complement), and NOT (1's complement).

The instructions in Mini SRC are one-word (32-bit) long each. They can be categorized as Load and Store instructions, Arithmetic and Logical instructions, Conditional Branch and Jump instructions, Input/Output instructions, and miscellaneous instructions. There are no push and pop instructions (they can be implemented by other instructions). The following addressing modes are supported: Direct, Indexed, Register, Register Indirect, Immediate, and Relative.

#### 2.1 Instruction Formats

There are five instruction formats, as shown in the table below.

| Name            |         | Fields |        |          | Comments |                                      |
|-----------------|---------|--------|--------|----------|----------|--------------------------------------|
| Field aire      | 3127    | 2623   | 2219   | 1815     | 140      | All instructions are 32-bit long     |
| Field size      | 5 bits  | 4 bits | 4 bits | 4 bits   | 15 bits  |                                      |
| R-Format        | OP-code | Ra     | Rb     | Rc       | Unused   | Arithmetic/Logical                   |
| I-Format        | OP-code | Ra     | Rb     | Constant | C/Unused | Arithmetic/Logical; Load/Store; Imm. |
| <b>B-Format</b> | OP-code | Ra     | C2     | Cons     | tant C   | Branch                               |
| J-Format        | OP-code | Ra     |        | Unused   |          | Jump; Input/Output; Special          |
| M-Format        | OP-code |        | Unused |          |          | Misc.                                |

The instruction formats for different categories are detailed below:

Load and Store instructions: operands in memory can be accessed only through load/store instructions.

| (a) lo | d, ldi, st |       | I-Forn | at   |  |
|--------|------------|-------|--------|------|--|
| 3      | 31 27      | 26 23 | 22 19  | 18 0 |  |
|        | Op-code    | Ra    | Rb     | С    |  |

- Arithmetic and Logical instructions:
  - (a) add, sub, and, or, shr, shl, ror, rol R-Format



(b) addi, andi, ori I-Format



(c) mul, div, neg, not I-Format

| 3 | 1 27 | 26 23 | 22 19 | 18 0 |
|---|------|-------|-------|------|
|   |      |       |       |      |

| Op-code | Ra | Rb | unused |
|---------|----|----|--------|
|         |    |    |        |

• Branch instructions: brzr, brnz, brmi, brpl B-Format



• Jump instructions: jr, jal J-Format



• Input/Output and MFHI/MFLO instructions: in, out, mfhi, mflo J-Format

| 3 | 31 27   | 26 23 | 22 0   | ) |
|---|---------|-------|--------|---|
|   | Op-code | Ra    | unused |   |
|   |         |       |        |   |

Miscellaneous instructions: nop, halt
 M-Format

31 27 26 0 Op-code -- unused --

Op-code: specifies the operation to be performed.

Ra, Rb, Rc: 0000: R0, 0001: R1, ..., 1111: R15

C: constant (data or address)

C2: condition --00: branch if zero

--01: branch if nonzero --10: branch if positive --11: branch if negative

Notation: x: 0 or 1

-: unused

#### 2.2 Instructions

The instructions (with their op-code patterns shown in parentheses) perform the following operations:

### **Load and Store Instructions**

ld, ldi, st:

| •                                                                           |                                                                                                                                                     | Assembly      | language  |
|-----------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|---------------|-----------|
| ld: Load Direct (00000xxxx00000xxxxxxxxxxxxxxxxxxxxxxx                      | R[Ra] 	M[C (sign-extended)]  Direct addressing, Rb = R0                                                                                             | ld            | Ra, C     |
| ld: Load Indexed/Register Indirect (00000xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   | R[Ra] ← M[R[Rb] + C (sign-extended)] Indexed addressing, Rb ≠ R0 If C = 0 → Register Indirect add                                                   | ld<br>ressing | Ra, C(Rb) |
| Idi: Load Immediate<br>(00001xxxx00000xxxxxxxxxxxxxxxxxxxxxxxx              | R[Ra] ← C (sign-extended) Immediate addressing, Rb = R0                                                                                             | ldi           | Ra, C     |
| (00001xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                     | R[Ra] ← R[Rb] + C (sign-extended)  Immediate addressing, Rb ≠ R0  If C = 0 → instruction acts like a simple  If C ≠ 0 and Ra = Rb → Increment/decre | _             |           |
| st: Store Direct (00010xxxx00000xxxxxxxxxxxxxxxxxxxxxxxx                    | M[C (sign-extended)] ← R[Ra]  Direct addressing, Rb = R0                                                                                            | st            | C, Ra     |
| st: Store Indexed/Register Indirect (00010xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | M[R[Rb] + C (sign-extended)] ← R[Ra] Indexed addressing, Rb ≠ R0 If C = 0 → Register Indirect add                                                   | st<br>ressing | C(Rb), Ra |

# **Arithmetic and Logical Instructions**

(a): add, sub, and, or, shr, shl, ror, rol

| add: Add<br>(00011xxxxxxxxxxxxx)                               | $R[Ra] \leftarrow R[Rb] + R[Rc]$                                                                                                 | add      | Ra, Rb, Rc |
|----------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|----------|------------|
| sub: Sub<br>(00100xxxxxxxxxxxxx)                               | $R[Ra] \leftarrow R[Rb] - R[Rc]$                                                                                                 | sub      | Ra, Rb, Rc |
| shr: Shift Right (00101xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx       | Shift right R[Rb] into R[Ra] by count in R[Rc]                                                                                   | shr      | Ra, Rb, Rc |
| shl: Shift Left<br>(00110xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     | Shift left R[Rb] into R[Ra] by count in R[Rc]                                                                                    | shl      | Ra, Rb, Rc |
| ror: Rotate Right (00111xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx      | Rotate right R[Rb] into R[Ra] by count in R[Rc]                                                                                  | ror      | Ra, Rb, Rc |
| rol: Rotate Left (01000xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx       | Rotate left R[Rb] into R[Ra] by count in R[Rc]                                                                                   | rol      | Ra, Rb, Rc |
| and: AND (01001xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx               | R[Ra] <b>←</b> R[Rb] ^ R[Rc]                                                                                                     | and      | Ra, Rb, Rc |
| or: OR<br>(01010xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx              | $R[Ra] \leftarrow R[Rb]_V R[Rc]$                                                                                                 | or       | Ra, Rb, Rc |
| (b): addi, andi, ori                                           |                                                                                                                                  |          |            |
| addi: Add Immediate<br>(01011xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | R[Ra] ← R[Rb] + C (sign-extended) ) Immediate addressing                                                                         | addi     | Ra, Rb, C  |
|                                                                | If C = 0 → instruction acts like a simple reginal of C ≠ 0 and Ra = Rb → Increment/decrement Similar to Idi, however Rb can be a | nt instr | uction     |
| andi: AND Immediate (01100xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    | R[Ra] ← R[Rb] ∧ C (sign-extended)  Immediate addressing                                                                          | andi     | Ra, Rb, C  |
| ori: OR Immediate<br>(01101xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   | R[Ra] ← R[Rb] <sub>V</sub> C (sign-extended)  Immediate addressing                                                               | ori      | Ra, Rb, C  |

## (c): mul, div, neg, not

| mul: Multiply<br>(01110xxxxxxxx)                          | HI, LO ← R[Ra] × R[Rb] | mul | Ra, Rb |
|-----------------------------------------------------------|------------------------|-----|--------|
| div: Divide<br>(01111xxxxxxxxx)                           | HI, LO ← R[Ra] ÷ R[Rb] | div | Ra, Rb |
| neg: Negate<br>(10000xxxxxxxx)                            | R[Ra] <b>←</b> - R[Rb] | neg | Ra, Rb |
| not: NOT<br>(10001xxxxxxxxx)                              | R[Ra] 	← R[Rb]         | not | Ra, Rb |
| Conditional Branch Instructions<br>brzr, brnz, brmi, brpl |                        |     |        |

| Branch                                 | PC ← PC + 1 + C (sign-extended) | if R[Ra] meets the condition |  |
|----------------------------------------|---------------------------------|------------------------------|--|
| (10010xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |                                 |                              |  |
| Con                                    | dition:00: branch if zero       | brzr Ra, C                   |  |
|                                        | 01: branch if nanzara           | brnz Ba C                    |  |

--01: branch if nonzero brnz Ra, C
--10: branch if positive brpl Ra, C
--11: branch if negative brmi Ra, C

## <u>Jump Instructions</u>

jr, jal

## Input/Output and MFHI/MFLO Instructions

in, out, mfhi, mflo

| in: Input   | R[Ra] 🗲 In.Port  | in  | Ra |
|-------------|------------------|-----|----|
| (10101xxxx) |                  |     |    |
|             |                  |     |    |
| out: Output | Out.Port 	 R[Ra] | out | Ra |
| (10110xxxx) |                  |     |    |

| mfhi: Move from HI<br>(10111xxxx)    | R[Ra] ← HI                        | mfhi | Ra |
|--------------------------------------|-----------------------------------|------|----|
| mflo: Move from LO (11000xxxx)       | R[Ra] ← LO                        | mflo | Ra |
| Miscellaneous Instructions nop, halt |                                   |      |    |
| nop: No-operation (11001)            | Do nothing                        | nop  |    |
| halt: Halt                           | Halt the control stepping process | halt |    |

#### 3. Design Phases and Final Report

There are four design phases in this project, and you are strongly advised to put sufficient time to work on your project. You may use an all HDL design approach, or you may opt for a mixed Schematic/HDL design approach for your CPU. It is not recommended to use a mixed VHDL/Verilog design methodology.

**Phase 1:** The Datapath will be partially designed and tested using Functional Simulation. Phase one is worth 7% of the course mark.

You will demo your design and simulation results, and hand in a hard copy report to your TA consisting of VHDL/Verilog code, schematic screenshot (if any), along with testbenches and Functional Simulation results.

**Phase 2:** The Datapath will be complemented by adding the "Select and Encode logic", "CON FF Logic", branch and jump instructions, "Memory Subsystem", and the "Input/Output Ports". It will be tested using Functional Simulation. Phase two is worth 7% of the course mark.

You will demo your design and simulation results, and hand in a hard copy report to your TA consisting of VHDL/Verilog code, schematic screenshot (if any), along with testbenches and Functional Simulation results.

Phase 3: The Control Unit will be designed in VHDL or Verilog, and tested using Functional Simulation. You will be provided with a test program to verify your Control Unit. Phase three is worth 5% of the course mark. You will demo your design and simulation results, and hand in a hard copy report to your TA consisting of your VHDL or Verilog code, schematic screenshot (if any), along with testbenches and Functional Simulation results.

**Phase 4:** The Datapath and Control Unit will be tested together using both Functional Simulation and implementation on the DEO or DEO-CV evaluation board. You will be provided with a test program to verify your CPU. Phase four is worth 3% of the course mark.

You will demo your design, simulation and on-board results to your TA consisting of your VHDL or Verilog code, schematic screenshot (if any), along with testbenches and Functional Simulation results. There is no report for this phase.

**Final CPU design project report:** You will submit a comprehensive final soft copy report to the course instructor detailing your CPU design project, including performance results and analysis, discussions, and conclusions. The Final report is worth 3% of the course mark.

#### 4. Schedule

Deadline for each phase and Final Report is shown in the following table and on the course website.

|              | Deadline                            | Automatic extension by one week without penalty | 25% Penalty              | 40% Penalty              |
|--------------|-------------------------------------|-------------------------------------------------|--------------------------|--------------------------|
| Phase 1 demo | Feb 3 and Feb 10 (Monday Session)   | Feb 24 (Monday Session)                         | Mar 2 (Monday Session)   | Mar 30 (Monday Session)  |
| and report   | Feb 4 and Feb 11 (Tuesday Session)  | Feb 25 (Tuesday Session)                        | Mar 3 (Tuesday Session)  | Mar 31 (Tuesday Session) |
| Phase 2 demo | Mar 2 and Mar 9 (Monday Session)    | Mar 16 (Monday Session)                         | Mar 23 (Monday Session)  | Mar 30 (Monday Session)  |
| and report   | Mar 3 and Mar 10 (Tuesday Session)  | Mar 17 (Tuesday Session)                        | Mar 24 (Tuesday Session) | Mar 31 (Tuesday Session) |
| Phase 3 demo | Mar 16 and Mar 23 (Monday Session)  | Mar 30 (Monday Session)                         |                          |                          |
| and report   | Mar 17 and Mar 24 (Tuesday Session) | Mar 31 (Tuesday Session)                        | -                        | -                        |
| Phase 4 demo | Mar 23 and Mar 30 (Monday Session)  |                                                 |                          |                          |
|              | Mar 24 and Mar 31 (Tuesday Session) | -                                               | -                        | -                        |
| Final report | Apr 3                               | -                                               | -                        | -                        |

#### 5. Bonus Mark

Please note that our minimum goal is a one-bus architecture for the Mini SRC. However, there will be up to 5% bonus mark if you design and implement, for instance, a 3-bus architecture, new instructions, interrupt/exception handling, VHDL/Verilog implementations of advanced techniques for ALU operations, as well as any other advanced techniques such as pipelining, branch prediction, hazard detection, and superscalar design, etc., to improve performance. You are advised to tackle the 1-bus architecture first, and when you feel comfortable with your design then aim for any other improvements.