# Pipelined CPU

Jinyang Li

Based on the slides of Patterson and Hennessy

#### What we've learnt so far

- Combinatorial logic
  - ALU
- Sequential logic
  - Clocks and basic state elements (SR latch, D latch, flip-flop)
- The single-cycle CPU design

# Single-cycle CPU uses a slow clock



Clock cycle >= 800 ps

#### Pipelining: a laundry analogy



#### RISC-V Pipeline

- Five stages, one step per stage
  - 1. IF: Instruction fetch from memory
  - 2. ID: Instruction decode & register read
  - 3. EX: Execute operation or calculate address
  - 4. MEM: Access memory operand
  - 5. WB: Write result back to register



#### Pipeline Speedup

- Pipelining increases throughput (instructions/sec)
  - Latency (time for each instruction) does not decrease
- If all stages are balanced (i.e., all take the same time)
  - throughput<sub>pipelined</sub> = number-of-stages \* throughput<sub>nonpipelined</sub>
  - If not balanced, speedup is less

Throughput = 1/time between instructions

#### Pipelining and ISA Design

- RISC-V ISA designed for pipelining
  - All instructions are 32-bits
    - Easier to fetch and decode in one cycle
    - c.f. x86: 1- to 17-byte instructions
  - Few and regular instruction formats
    - Can decode and read registers in one step
  - Load/store addressing
    - Can calculate address in 3<sup>rd</sup> stage, access memory in 4<sup>th</sup> stage

#### Pipeline challenges: hazards

- Situations that prevent starting the next instruction in the next cycle
- Structure hazards
  - A required resource is busy
- Data hazard
  - Need to wait for previous instruction to complete its data read/write
- Control hazard
  - Deciding on control action depends on previous instruction

#### Structure Hazards

- Conflict use of a single resource
- Example: Suppose CPU uses a single memory
  - Load/store requires data access
  - Instruction fetch would have to *stall* for that cycle
    - Would cause a pipeline "bubble"
- Solution: Use separate instruction/data memories

#### **Data Hazards**

 An instruction depends on the previous instruction to complete its data read/write



#### Control hazard

 Wait until branch outcome determined before fetching next instruction



# A basic pipelined RISC-V CPU

# **Pipelined Datapath**



## Pipeline registers

Needed to hold information produced in previous cycle



#### IF for Load, Store

Instruction fetch

Single-clock-cycle diagram shows the state of an entire datapath during a single clock cycle



# ID for Load, Store, ...

Instruction decode



# ld Execution

#### **EX for Load**



# **MEM** for Load





#### **WB** for Load





# **Corrected Datapath for Load**



#### **EX for Store**





#### **MEM for Store**





#### **WB** for Store





# Multi-Cycle Pipeline Diagram

#### • Traditional form

Program execution order (in instructions)

| ld x10, 40(x1)  |  |  |  |  |  |
|-----------------|--|--|--|--|--|
| sub x11, x2, x3 |  |  |  |  |  |
| add x12, x3, x4 |  |  |  |  |  |
| ld x13, 48(x1)  |  |  |  |  |  |
| add x14, x5, x6 |  |  |  |  |  |

| Instruction fetch | Instruction decode | Execution          | Data<br>access     | Write-back         |                    |                |                |            |
|-------------------|--------------------|--------------------|--------------------|--------------------|--------------------|----------------|----------------|------------|
|                   | Instruction fetch  | Instruction decode | Execution          | Data<br>access     | Write-back         |                |                |            |
|                   |                    | Instruction fetch  | Instruction decode | Execution          | Data<br>access     | Write-back     |                |            |
|                   |                    |                    | Instruction fetch  | Instruction decode | Execution          | Data<br>access | Write-back     |            |
|                   |                    | ,                  |                    | Instruction fetch  | Instruction decode | Execution      | Data<br>access | Write-back |

# Multi-Cycle Pipeline Diagram

• Form showing resource usage cc 5 cc 6 cc 7 cc 8 cc 9



# Single-Cycle Pipeline Diagram • State of pipeline in a given cycle





# Pipelined Control (Simplified)



# **Pipelined Control**



#### Data Hazards in ALU Instructions

Example instruction sequence:

```
sub x2, x1,x3
and x12,x2,x5
or x13,x6,x2
add x14,x2,x2
sd x15,100(x2)
```

- Solution: forwarding (aka bypassing)
  - Use result when it's computed; don't wait for it to be stored in register

### **Dependencies & Forwarding**





#### Detecting the Need to Forward

- Pass register numbers along pipeline
  - e.g., ID/EX.RegisterRs1: register number for Rs1 in ID/EX pipeline register
- ALU operand register numbers (in EX stage) are:
  - ID/EX.RegisterRs1
  - ID/EX.RegisterRs2
- Data hazards when
  - 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1
  - 1b. EX/MEM.RegisterRd = ID/EX.RegisterRs2
  - 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs1
  - 2b. MEM/WB.RegisterRd = ID/EX.RegisterRs2

EX-hazard: Fwd from EX/MEM pipeline reg

Mem-hazard: Fwd from MEM/WB pipeline reg

#### Detecting the Need to Forward

- But only if forwarding instruction will write to a register!
  - EX/MEM.RegWrite, MEM/WB.RegWrite
- And only if Rd for that instruction is not x0
  - EX/MEM.RegisterRd ≠ 0,
     MEM/WB.RegisterRd ≠ 0

# **Forwarding Paths**



#### **Double Data Hazard**

• Consider the sequence:

```
add x1,x1,x2
add x1,x1,x3
add x1,x1,x4
```

- Both hazards occur
  - Want to use the most recent
- Revise MEM hazard condition
  - Only fwd if EX hazard condition isn't true



#### Load-Use Hazard Detection

Load-use hazard cannot be resolved using forwarding alone



- Check load-use hazard using condition
  - ID/EX.MemRead and ((ID/EX.RegisterRd = IF/ID.RegisterRs1) or (ID/EX.RegisterRd = IF/ID.RegisterRs1))
- If detected, stall and insert bubble

#### How to Stall the Pipeline

- Force control values in ID/EX register to 0
  - EX, MEM and WB do nop (no-operation)
- Prevent update of PC and IF/ID register
  - Using instruction is decoded again
  - Following instruction is fetched again
  - 1-cycle stall allows MEM to read data for 1d
    - Can subsequently forward to EX stage

#### Load-Use Data Hazard







#### **Summary**

- Pipeline increases throughput by overlapping execution of multiple instructions
- Pipeline hazard
  - Structure (solution: add resources)
  - Data (solution: forwarding)
  - Control (next class)
- Pipeline stalls