# 5-stage Pipelined Processor Design Report

Team #4

Mohamed Shawky SEC:2, BN:16

Remonda Talaat SEC:1, BN:20

Evram Youssef SEC:1, BN:9

Mahmoud Adas SEC:2, BN:21

March 30, 2020

# Contents

| Ι  | Int | troduction                   | 1 |
|----|-----|------------------------------|---|
|    | 0.1 | System Overview              | 2 |
|    | 0.2 | Task Distribution            | 2 |
| II | O   | verall System                | 3 |
|    | 0.3 | Overall System Design Schema | 4 |
|    | 0.4 | Memory Specs                 | 4 |
|    | 0.5 | PC Control Unit              | 5 |
|    |     | 0.5.1 Inputs                 | 5 |
|    |     | 0.5.2 Outputs                | 5 |
|    |     | 0.5.3 Logic                  | 5 |
|    | 0.6 | Dynamic Branch Prediction    | 5 |
|    | 0.7 | Branch Address Unit          | 6 |
|    |     | 0.7.1 Inputs                 | 6 |
|    |     | 0.7.2 Outputs                | 6 |
|    |     | 0.7.3 Logic                  | 6 |
|    | 0.8 | Register File                | 6 |
|    |     | 0.8.1 Registers              | 6 |
|    |     | 0.8.2 Inputs                 | 7 |
|    |     | 0.8.3 Outputs                | 7 |
|    |     | 0.8.4 Logic                  | 7 |
|    | 0.9 | ALU                          | 7 |
|    |     | 0.9.1 Inputs                 | 7 |
|    |     | 0.9.2 Outputs                | 7 |
|    |     | 0.9.3 ALU Operations         | 8 |
|    |     | 0.9.4 Logic                  | 8 |

| III Instruction Format                    | 10   |
|-------------------------------------------|------|
| 0.10 One Operand Operations               | . 11 |
| 0.11 Special Operations                   | . 11 |
| 0.12 Two Operand Operations               | . 11 |
| 0.13 Memory Operations                    | . 12 |
| 0.14 Branch and Change Control Operations | . 13 |
| IV Control Signals                        | 14   |
| V Pipeline Stages                         | 16   |
| 0.15 Overview                             | . 17 |
| 0.15.1 Fetch Stage                        |      |
| 0.15.2 Decode Stage                       |      |
| 0.15.3 Execute Stage                      | . 17 |
| 0.15.4 Memory Stage                       |      |
| 0.15.5 Write-Back Stage                   | . 18 |
| 0.16 IF/ID Buffer                         | . 18 |
| 0.16.1 Registers                          |      |
| 0.16.2 Control Signals                    | . 18 |
| 0.17 ID/EX Buffer                         |      |
| 0.17.1 Registers                          |      |
| 0.17.2 Control Signals                    |      |
| 0.18 EX/M Buffer                          |      |
| 0.18.1 Registers                          |      |
| 0.18.2 Control Signals                    |      |
| 0.19 M/WB Buffer                          |      |
| 0.19.1 Registers                          | . 19 |
| VI Pipeline Hazards and solutions         | 20   |
| 0.20 Structural Hazards                   |      |
| 0.20.1 Detection                          |      |
| 0.20.2 Handling                           |      |
| 0.21 Data Hazards                         |      |
| 0.21.1 Detection                          |      |
| 0.21.2 Handling                           |      |

| 0.0                  | 5-stage Pipelined Processor Design Re | port |
|----------------------|---------------------------------------|------|
|                      |                                       |      |
| 0.22 Control Hazards |                                       | 22   |
| 0.22.1 Detection     |                                       | 22   |
| 0.22.2 Handling      |                                       | 23   |

# List of Figures

| 1 | Overall System Design         | 4  |
|---|-------------------------------|----|
| 2 | Register File Diagram         | 6  |
| 3 | Hazard Detection Unit Diagram | 21 |

# List of Tables

| 1 | One Operand Instruction Mapping |  |  |  |  |  |  |  | 11 |
|---|---------------------------------|--|--|--|--|--|--|--|----|
| 2 | Two Operand Instruction Mapping |  |  |  |  |  |  |  | 12 |
| 3 | Memory Instruction Mapping      |  |  |  |  |  |  |  | 13 |
| 4 | One Operand Instruction Mapping |  |  |  |  |  |  |  | 13 |

# Part I Introduction

## 0.1 System Overview

This document reports our design work of the 5-stage pipelined processor using Harvard architecture. We discuss the overall system blocks and connections, the functionalities of the different blocks and the hazard solutions.

#### 0.2 Task Distribution

TODO: Task distribution table

# Part II Overall System

# 0.3 Overall System Design Schema



Figure 1: Overall System Design

TODO(1): Add branch predictor design

TODO(2): Add Interrupt handling design

TODO(3): Add stack pointer related instructions design

TODO(4): Add 32-bit fetch detection unit

### 0.4 Memory Specs

- We have 2 separate memory units, one for instructions and another for data and stack.
- Instruction Memory:
  - $-2^{32} X 16 bits$

- 16-bit bus
- Data Memory:
  - $-2^{32} X 16 bits$
  - 32-bit bus
  - SP starts at  $2^{32}$ -1

#### 0.5 PC Control Unit

#### **0.5.1** Inputs

- IF Flush (1 bit)
- Stall Signal (1 bit)
- RESET Signal (1 bit)

#### 0.5.2 Outputs

• PC Mux Selectors (2 bits)

#### 0.5.3 Logic

- If IF Flush == 1, Output = 01
- If RESET == 1, Output = 10
- If Stall == 1, Output = 11
- Else, Output = 00

#### 0.6 Dynamic Branch Prediction

TODO: Add branch prediction details in both overall diagram and section

#### 0.7 Branch Address Unit

#### 0.7.1 Inputs

- PC Next Address (32 bits)
- Instruction Address (32 bits)
- OpCode (7 bits)

#### 0.7.2 Outputs

- IF Flush (1 bit)
- Branch Address (32 bits)

#### 0.7.3 Logic

- Check if OpCode is of a branch instruction, if true:
  - Check whether PC Next Address is equal to Instruction Address
  - If true:
    - \* IF Flush = 0, Branch Address = Instruction Address
  - If false:
    - \* IF Flush = 1, Branch Address = Instruction Address

#### 0.8 Register File

#### 0.8.1 Registers

- 8 general purpose registers
- Stack pointer (SP) register
- Program counter (PC) register

#### 0.8.2 Inputs

- Dest Regs: 2X4 bits (for destination selection)
- SRC Regs: 2X4 bits (for source selection)
- WB values: 2X32 bits (for write back values)
- RESET: 1 bit (for registers clear).
- Branch/IO: 2 bits (to determine whether the operation is IO or branch)
- IN Port: 32 bits (IO input port)

#### 0.8.3 Outputs

- OP1: 32 bits (value of first operand)
- OP2: 32 bits (value of second operand)
- Instruction Address: 32 bits (value of branch address)
- OUT Port: 32 bits (IO output port)

#### 0.8.4 Logic

The register selector acts like a decoder to select the required operation and the register on which the operation performed.

#### 0.9 ALU

#### 0.9.1 Inputs

- ALUop: 4 bits (refer to ALU Operations below)
- Operands: 2X32 bits (2 input operands)

#### 0.9.2 Outputs

• ALUout: 32 bits (operation result)

#### 0.9.3 ALU Operations

- NOP 0000
- INC 0001
- DEC -0010
- ADD 0011
- SUB 0100
- AND -0101
- OR 0110
- NOT 0111
- SHL 1000
- SHR -1001

#### 0.9.4 Logic

- ALU performs the operation and changes the CCR accordingly.
- The input operands of the ALU are multiplexed between forwarded data and register data, with selectors from data forwarding unit.



Figure 2: Register File Diagram

# Part III Instruction Format

#### 0.10 One Operand Operations

- 4 bits (1111) for one operand instructions.
- 3 bits to define instruction.
- 3 bits for destination register.
- 1 bit to define the memory slots occupied by the instruction.
- Total of 11 bits, padded with 5 0's to fit 16 bits.

| Table 1. One Operand Instruction Mapping |         |             |       |                   |  |  |  |  |  |  |
|------------------------------------------|---------|-------------|-------|-------------------|--|--|--|--|--|--|
| Operation                                | OpCode  | Destination | 16 32 | Conditions        |  |  |  |  |  |  |
| IN                                       | 1111000 | 000:111     | 0     |                   |  |  |  |  |  |  |
| NOT                                      | 1111001 | 000:111     | 0     | if $!Rdst=0,Z=1$  |  |  |  |  |  |  |
|                                          |         |             |       | if $!Rdst<0,N=1$  |  |  |  |  |  |  |
| INC                                      | 1111010 | 000:111     | 0     | if Rdst+1=0,Z=1   |  |  |  |  |  |  |
|                                          |         |             |       | if $Rdst+1<0,N=1$ |  |  |  |  |  |  |
| DEC                                      | 1111011 | 000:111     | 0     | if Rdst-1=0,Z=1   |  |  |  |  |  |  |
|                                          |         |             |       | if Rdst-1<0,N=1   |  |  |  |  |  |  |
| OUT                                      | 1111100 | 000:111     | 0     |                   |  |  |  |  |  |  |

Table 1: One Operand Instruction Mapping

# 0.11 Special Operations

• 16 0's to represent NOP (0000000000000000).

#### 0.12 Two Operand Operations

- 4 bits to define instruction.
- 3 bits for each of Rsrc1, Rsrc2 and Rdst.
- 1 bit to define the memory slots occupied by the instruction.
- 16 bits for immediate values.
- Total of 14 bits in most cases with some exceptions mentioned below.

| Operation | OpCode | Rsrc1   | Rsrc2   | Rdst    | imm     | 16 32 | Conditions      |  |  |  |
|-----------|--------|---------|---------|---------|---------|-------|-----------------|--|--|--|
| SWAP      | 0001   | 000:111 |         | 000:111 | _       | 0     |                 |  |  |  |
|           |        |         |         |         |         |       | _               |  |  |  |
| ADD       | 0010   | 000:111 | 000:111 | 000:111 |         | 0     | if Result=0,Z=1 |  |  |  |
|           |        |         |         |         |         |       | if Result<0,N=1 |  |  |  |
| SUB       | 0011   | 000:111 | 000:111 | 000:111 | _       | 0     | if Result=0,Z=1 |  |  |  |
|           |        |         |         |         |         |       | if Result<0,N=1 |  |  |  |
| AND       | 0100   | 000:111 | 000:111 | 000:111 | _       | 0     | if Result=0,Z=1 |  |  |  |
|           |        |         |         |         |         |       | if Result<0,N=1 |  |  |  |
| OR        | 0101   | 000:111 | 000:111 | 000:111 | _       | 0     | if Result=0,Z=1 |  |  |  |
|           |        |         |         |         |         |       | if Result<0,N=1 |  |  |  |
| SHL       | 0110   | 000:111 | -       | _       | 16 bits | 1     | update carry    |  |  |  |
|           |        |         |         |         |         |       | flag            |  |  |  |
| SHR       | 0111   | 000:111 | -       | _       | 16 bits | 1     | update carry    |  |  |  |
|           |        |         |         |         |         |       | flag            |  |  |  |
| IADD      | 1000   | 000:111 | _       | 000:111 | 16 bits | 1     | if Result=0,Z=1 |  |  |  |
|           |        |         |         |         |         |       | if Result<0,N=1 |  |  |  |

Table 2: Two Operand Instruction Mapping

# 0.13 Memory Operations

- 4 bits to define instruction.
- 3 bits for destination register.
- 1 bit to define the memory slots occupied by the instruction.
- 16 bits for immediate values.
- 20 bits for effective addresses.
- Total of 8 bits with no immediate values or effective addresses.
- Total of 24 bits with immediate values.
- Total of 28 bits with effective addresses.

Rdst Conditions Operation OpCode EA16|32imm PUSH 1001 000:1110 POP 1010 000:111 0 LDM 1011 000:111 1 16 bits LDD 1100 000:111 20 bits 1 STD 20 bits 1 1101 000:111

Table 3: Memory Instruction Mapping

## 0.14 Branch and Change Control Operations

- 4 bits (0000) for branching instructions.
- 3 bits to define instruction.
- 3 bits for destination register.
- 1 bit to define the memory slots occupied by the instruction.
- Total of 11 bits, padded with 5 0's to fit 16 bits.

Table 4: One Operand Instruction Mapping

| Operation | OpCode  | Destination | 16 32 | Conditions |
|-----------|---------|-------------|-------|------------|
| JZ        | 0000001 | 000:111     | 0     |            |
| JMP       | 0000010 | 000:111     | 0     |            |
| CALL      | 0000011 | 000:111     | 0     |            |
| RET       | 0000100 | _           | 0     |            |
| RTI       | 0000101 |             | 0     |            |

# Part IV Control Signals

TODO: Add related control signals for each instruction

# Part V Pipeline Stages

#### 0.15 Overview

This section discusses the 5 stages of our system and their functionalities.

#### 0.15.1 Fetch Stage

- Responsible for fetching the next instruction.
- Can take two cycles in case of 32-bit instructions.
- Contains a branch prediction unit to determine the next address to be fetched in case of branching.
- Outputs the instruction bits into IF/ID Buffer.

#### 0.15.2 Decode Stage

- Responsible for decoding the instruction bits into control signals.
- Outputs the corresponding signals to ID/EX Buffer.
- Contains register file to output operand values and register-related operations.
- Determines the correct branch address in case of branching instructions by using Branch Address Unit.

#### 0.15.3 Execute Stage

- Responsible for ALU operations.
- Determines the correct ALU output and pass it with other signals to EX/M Buffer.

#### 0.15.4 Memory Stage

• Responsible for Data Memory IO.

#### 0.15.5 Write-Back Stage

Responsible for passing correct output values to the destination registers.

## 0.16 IF/ID Buffer

#### 0.16.1 Registers

- Instruction Register (32 bits)
- Next Address Register (32 bits)

#### 0.16.2 Control Signals

- Flush: clear buffer (1 bit)
- Stall: freeze buffer (1 bit)

## 0.17 ID/EX Buffer

#### 0.17.1 Registers

- Operand Registers (2X32 bits)
- Destination Register (4 bits)
- OpCode Register (7 bits)

#### 0.17.2 Control Signals

- Stall (IN): freeze buffer (1 bit)
- Destination Register (OUT) (4 bits)
- $\bullet$  Output Values (OUT) (32 bits)

## 0.18 EX/M Buffer

#### 0.18.1 Registers

- ALUout Register (32 bits)
- MEM IN Register (32 bits)
- Opcode Register (7 bits)
- Destination Register (4 bits)

#### 0.18.2 Control Signals

- Destination Register (OUT) (4 bits)
- Output Values (OUT) (32 bits)

## 0.19 M/WB Buffer

#### 0.19.1 Registers

- ALUout Register (32 bits)
- MEM OUT (32 bits)
- OpCode (7 bits)
- Destination Register (4 bits)

# 

#### 0.20 Structural Hazards

#### 0.20.1 Detection

The structural hazard occurs in data memory and register file.

#### 0.20.2 Handling

The structural hazard in data memory is solved by using 2 memory units, one for instructions and one for data. Both have the same specs (previously mentioned).

However structural hazard in register file is handled by forcing the write back to happen in the first half of the clock cycle and the decode to happen in the second half.

#### 0.21 Data Hazards



Figure 3: Hazard Detection Unit Diagram

#### 0.21.1 Detection

#### Hazard Detection Unit (HDU)

HDU consists of 3 parts:

- OPCode Checker: checks the opcode of the current instruction to check whether it will cause data hazard or not, then activates the Register Comparator accordingly.
- Register Comparator: compares the decode source registers with the destination registers of the execute and memory stages.
- Output Unit: outputs stall signal in case of load and pop instructions and data forward unit enable in case of other data hazards.

#### 0.21.2 Handling

#### Stall

Occurs only at Fetch and Decode stage, due to load(pop) use case.

- Fetch same instruction (don't increment the program counter).
- Latch IF/ID buffer with the same values.
- Freeze Decode stage.
- Clear ID/EX buffer.

#### **Data Forwarding**

- EX/MEM buffer -> Execute / Decode.
- ID/EX buffer > Decode.

#### 0.22 Control Hazards

#### 0.22.1 Detection

The branch address calculation occurs in the Decode stage. So, the hazard might affect only the Fetch stage, which will be flushed in case of wrong address prediction.

#### 0.22.2 Handling

- At Fetch stage, always check the branch predictor and calculate the next address accordingly.
- At Decode stage, we have a *Branch Address Unit* that checks whether the OPCode is of a branch operation. If so, it passes the address to the program counter and compares the correct address with the address of the counter to decide whether to flush the Fetch stage or not.

#### Flush

Occurs only at Fetch Stage, due to wrong branch prediction at Decode stage.

- Load new address in the program counter.
- Remove fetched instructions from IF/ID buffer.

#### **Dynamic Branch Prediction**

We use 2-bit branch predictor, which is a *Finite State Machine* (FSM) to predict whether the branch will be taken (1) or not (0).