# 18-447 Lecture 4: Single-Cycle Microarchitecture

James C. Hoe

Department of ECE

Carnegie Mellon University

### Housekeeping

- Your goal today
  - first try at implementing the RV32I ISA
- Notices
  - Student survey on Canvas, due Wednesday
  - Handout #4: HW1, due noon 2/7
  - Lab 1, Part A, due Week 3
  - Lab 1, Part B, due Week 4
- Readings
  - P&H Ch 4.1~4.4
  - finish reading P&H Ch 2

### **Instruction Processing FSM**



- state = program visible state
- state transition = instruction execution
- Nice ISAs have atomic instruction semantics
  - one state transition per instruction in abstract FSM
- The implementation FSM can look wildly different

# Program Visible State (aka Architectural State)



## "Magic" Memory and Register File

- Combinational Read
  - output of the read data port is a combinational function of the register file contents and the corresponding read select port
- Synchronous write
  - the selected register (or memory location) is updated on the posedge clock transition when write enable is asserted
    - Cannot affect read output in between clock edges

### Simplifying Characteristics of "RISC"

- Simple ALU operations
  - 2-input, 1-output arithmetic and logical operations
  - few alternatives for accomplishing the same thing
- Simple data movements
  - ALU ops are register-to-register, never memory
  - "load-store" architecture, 1 addressing mode
- Simple branches
  - limited varieties of branch conditions and targets
  - PC-offset
- Simple instruction encoding
  - all instructions encoded in the same number of bits
  - simple, fixed formats

### **RISC Instruction Processing**

- 5 generic steps
  - instruction fetch
  - instruction decode and operand fetch
  - ALU/execute
  - memory access (not required by non-mem instructions)



# Single-Cycle Datapath for RV32I ALU Instructions

### Register-Register ALU Instructions

Assembly (e.g., register-register addition)

Machine encoding

| 0000000 | rs2   | rs1   | 000   | rd    | 0110011 |
|---------|-------|-------|-------|-------|---------|
| 7-bit   | 5-bit | 5-bit | 3-bit | 5-bit | 7-bit   |

- Semantics
  - $GPR[rd] \leftarrow GPR[rs1] + GPR[rs2]$
  - $PC \leftarrow PC + 4$
- Exceptions: none (ignore carry and overflow)
- Variations
  - Arithmetic: {ADD, SUB}
  - Compare: {signed, unsigned} x {Set if Less Than}
  - Logical: {AND, OR, XOR}
  - Shift: {Left, Right-Logical, Right-Arithmetic}

#### ADD rd rs1 rs2





| 0000000 | rs2                            | rs1   | 000   | rd    | 0110011 |
|---------|--------------------------------|-------|-------|-------|---------|
| 7-bit   | 5-bit                          | 5-bit | 3-bit | 5-bit | 7-bit   |
|         | C] == A[<br>GPR[rd]<br>PC ← PC | ← GPI |       |       | [rs2]   |



## **R-Type ALU Datapath**



### **Reg-Immediate ALU Instructions**

Assembly (e.g., reg-immediate additions)

Machine encoding

| imm[11:0] | rs1   | 000   | rd    | 0010011 |
|-----------|-------|-------|-------|---------|
| 12-bit    | 5-bit | 3-bit | 5-bit | 7-bit   |

- Semantics
  - GPR[rd] ← GPR[rs1] + sign-extend (imm)
  - $PC \leftarrow PC + 4$
- Exceptions: none (ignore carry and overflow)
- Variations
  - Arithmetic: {ADDI, SMBI}
  - Compare: {signed, unsigned} x {Set if Less Than Imm}
  - Logical: {ANDI, ORI, XORI}
  - \*\*Shifts by unsigned imm[4:0]: {SLLI, SRLI, SRAI}

# **ADDI rd rs1 immediate**<sub>12</sub>



| <u>  imm[11:0]</u>                                                    | rs1   | [000] | rd    | 0010011 |
|-----------------------------------------------------------------------|-------|-------|-------|---------|
| 12-bit                                                                | 5-bit | 3-bit | 5-bit | 7-bit   |
| if MEM[PC] == AD<br>GPR[rd] $\leftarrow$ GP<br>PC $\leftarrow$ PC + 4 |       |       |       |         |



Combinational state update logic

### Datapath for R and I-type ALU Inst's



# Single-Cycle Datapath for Data Movement Instructions (i.e., Loads and Stores)

### **Load Instructions**

Assembly (e.g., load 4-byte word)

LW rd, offset<sub>12</sub>(base)  $\leftarrow$ 

Machine encoding

| offset[11:0] | base  | 010   | rd    | 0000011 |
|--------------|-------|-------|-------|---------|
| 12-bit       | 5-bit | 3-bit | 5-bit | 7-bit   |

- Semantics
  - byte\_address<sub>32</sub> = sign-extend(offset<sub>12</sub>) + GPR[base]
  - GPR[rd] ← MEM<sub>32</sub>[byte\_address]
  - $PC \leftarrow PC + 4$
- Exceptions: none for now
- Variations: LW, LH, LHU, LB, LBU

e.g., LB ::  $GPR[rd] \leftarrow sign-extend(MEM_8[byte\_address])$ 

LBU :: GPR[rd] ← zero-extend(MEM<sub>8</sub>[byte\_address])

Note: RV32I memory is byte-addressable, little-endian

### LW Datapath



if MEM[PC]==LW rd offset $_{12}$ (base)

EA = sign-extend(offset) + GPR[base]

GPR[rd]  $\leftarrow$  MEM[ EA ]

PC  $\leftarrow$  PC + 4

IF ID EX MEM WB

Combinational state update logic

#### **Store Instructions**

Assembly (e.g., store 4-byte word)

Machine encoding

| offset[11:5] | rs2   | base  | 010   | ofst[4:0] | 0100011 |
|--------------|-------|-------|-------|-----------|---------|
| 7-bit        | 5-bit | 5-bit | 3-bit | 5-bit     | 7-bit   |

- Semantics
  - byte\_address<sub>32</sub> = sign-extend(offset<sub>12</sub>) + GPR[base]
  - MEM<sub>32</sub>[byte\_address] ← GPR[rs2]
  - $PC \leftarrow PC + 4$
- Exceptions: none for now
- Variations: SW, SH, SB

e.g., SB:: 
$$MEM_8[byte\_address] \leftarrow (GPR[rs2])[7:0]$$

### **SW Datapath**



if MEM[PC]==SW rs2 offset<sub>12</sub>(base) EA = sign-extend(offset) + GPR[base] MEM[EA]  $\leftarrow$  GPR[rs2] PC  $\leftarrow$  PC + 4

IF ID EX MEM WB

Combinational state update logic

### **Load-Store Datapath**



### Datapath for Non-Control Flow Inst's



# Single-Cycle Datapath for Control Flow Instructions

### **Jump and Link Instruction**

Assembly

Note: implicit imm[0]=0

Machine encoding

| imm[20 10:1 11 19:12] | rd    | 1101111 | UJ-type |
|-----------------------|-------|---------|---------|
| 20-bit                | 5-bit | 7-bit   | _       |

- Semantics
  - target = PC + sign-extend(imm<sub>21</sub>)
  - $GPR[rd] \leftarrow PC + 4$
  - PC ← target

How far can you jump?

Exceptions: misaligned target (4-byte)

### **Unconditional Jump Datapath**



### (Conditional) Branch Instructions

Assembly (e.g., branch if equal)

BEQ rs1, rs2, imm<sub>13</sub> Note: implicit imm[0]=0

Machine encoding

| imm[12 10:5] | rs2   | rs1   | 000   | imm[4:1 11] | 1100011 |
|--------------|-------|-------|-------|-------------|---------|
| 7-bit        | 5-bit | 5-bit | 3-bit | 5-bit       | 7-bit   |

- Semantics
  - target = PC + sign-extend(imm<sub>13</sub>)
  - if GPR[rs1]==GPR[rs2] then  $PC \leftarrow target$

else  $PC \leftarrow PC + 4$ 

How far can you jump?

- Exceptions: misaligned target (4-byte) if taken
- Variations
  - BEQ, BNE, BLT, BGE, BLTU, BGEU

### **Conditional Branch Datapath**



### **Adding Control to Datapath**



18-447-S22-L04-S27, James C. Hoe, CMU/ECE/CALCM, ©2022

[Figure 4.17 from book, Copyright © 2018 Elsevier Inc. All rights reserved.]

### **Datapath Control Generation**



### **R-Type ALU Worksheet:**



[Figure 4.17 from book, Copyright © 2018 Elsevier Inc. All rights reserved.]

### I-Type ALU Worksheet:



[Figure 4.17 from book, Copyright © 2018 Elsevier Inc. All rights reserved.]

### LW Worksheet:



[Figure 4.17 from book, Copyright © 2018 Elsevier Inc. All rights reserved.]

### **SW Worksheet:**



[Figure 4.17 from book, Copyright © 2018 Elsevier Inc. All rights reserved.]

### **Branch Taken Worksheet:**



[Figure 4.17 from book, Copyright © 2018 Elsevier Inc. All rights reserved.]

### **Branch Not-Taken Worksheet:**



[Figure 4.17 from book, Copyright © 2018 Elsevier Inc. All rights reserved.]

## Jump (and Link?) ALU Worksheet:



[Figure 4.17 from book, Copyright © 2018 Elsevier Inc. All rights reserved.]

# **Single-Bit Control Signals**

|          | When De-asserted                                                | When asserted                            | Equation                                  |
|----------|-----------------------------------------------------------------|------------------------------------------|-------------------------------------------|
| Uselmm   | 2 <sup>nd</sup> ALU input from 2 <sup>nd</sup><br>GPR read port | 2 <sup>nd</sup> ALU input from immediate | (opcode!=IsRtype) &&<br>(opcode!=isBtype) |
| RFWrite  | GPR write disabled                                              | GPR write enabled                        | (opcode!=SW) &&<br>(opcode!=Bxx)          |
| MemtoRF  | Steer ALU result to GPR write port                              | steer memory load to GPR write port      | opcode==LW/H/B                            |
| PCtoRF   | Steer above result to GPR write port                            | Steer PC+4 to GPR write port             | (opcode==JAL) II<br>(opcode==JALR)        |
| MemRead  | Memory read disabled                                            | Memory read port return load value       | opcode==LW/H/B                            |
| MemWrite | Memory write disabled                                           | Memory write enabled                     | opcode==SW/H/B                            |

# **Multi-Bit Control Signals**

|            | Options                                                                                                       | Equation                                                                                                                                                            |  |  |
|------------|---------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| ALU Op     | <ul> <li>ADD, SUB,<br/>AND, OR,<br/>XOR, NOR, LT,<br/>and Shift</li> <li>bcond: EQ,<br/>NE, GE, LT</li> </ul> | case opcode  RTypeALU: according to funct3, funct7[5]  ITypeALU: according to funct3 only (except shift)  LW/SW/JALR: ADD  Bxx: SUB and select bcond function  : ?? |  |  |
| ImmExtend  | Itype, ItypeU,<br>Stype, SBtype,<br>Utype, UJtype                                                             | <ul> <li>select based on instruction format type</li> <li>(may want to have separate extension units for primary ALU and PC-offset adder)</li> </ul>                |  |  |
| PCSrc      | PC+4,<br>PCadder,<br>ALU                                                                                      | case opcode  JAL : PC + immediate  JALR : GPR + immediate  Bxx : taken?(PC + immediate):(PC + 4)  : PC+4                                                            |  |  |
| LoadExtend | W,H,HU,B,BU                                                                                                   | case func3                                                                                                                                                          |  |  |

18-447-S22-L04-S37, James C. Hoe, CMU/ECE/CALCM, ©2022

Architecture

### Architecture vs Microarchitecture

Architectural Level





You can read a clock without knowing how it works

Microarchitecture Level





Realization Level

machined alloy gears vs stamped sheet metal



[Computer Architecture, Blaauw and Brooks, 1997]

### **Special Notices about Labs**

- Lab 1 fully rolling
  - RISC-V simulator (C code)
  - single-cycle RISC-V (RTL Verilog)
- To get yourself rolling (even if waitlisted)
  - get a GitHub account
  - find lab partners
- Please observe
  - lab assignments MUST be done in groups of 2 or 3
  - entire group MUST be present during check-off
  - 10% per day penalty for late labs, capped at 50%
  - all labs MUST be checked off to pass the course