# RISC-V Processor Datapath

#### Recap: Complete RV32I ISA

|              |                |       |     |             |         | 1     |
|--------------|----------------|-------|-----|-------------|---------|-------|
|              | imm[31:12]     |       |     | rd          | 0110111 | LUI   |
|              | imm[31:12]     |       |     | rd          | 0010111 | AUIPO |
| imr          | n[20 10:1 11 1 | 9:12] |     | rd          | 1101111 | JAL   |
| imm[11:0     | )]             | rs1   | 000 | rd          | 1100111 | JALR  |
| imm[12 10:5] | rs2            | rs1   | 000 | imm[4:1 11] | 1100011 | BEQ   |
| imm[12 10:5] | rs2            | rs1   | 001 | imm[4:1 11] | 1100011 | BNE   |
| imm[12 10:5] | rs2            | rs1   | 100 | imm[4:1 11] | 1100011 | BLT   |
| imm[12 10:5] | rs2            | rs1   | 101 | imm[4:1 11] | 1100011 | BGE   |
| imm[12 10:5] | rs2            | rs1   | 110 | imm[4:1 11] | 1100011 | BLTU  |
| imm[12 10:5] | rs2            | rs1   | 111 | imm[4:1 11] | 1100011 | BGEU  |
| imm[11:0     | )]             | rs1   | 000 | rd          | 0000011 | LB    |
| imm[11:0     | imm[11:0]      |       |     | rd          | 0000011 | LH    |
| imm[11:0     | )]             | rs1   | 010 | rd          | 0000011 | LW    |
| imm[11:0     | 0]             | rs1   | 100 | rd          | 0000011 | LBU   |
| imm[11:0     | )]             | rs1   | 101 | rd          | 0000011 | LHU   |
| imm[11:5]    | rs2            | rs1   | 000 | imm[4:0]    | 0100011 | SB    |
| imm[11:5]    | rs2            | rs1   | 001 | imm[4:0]    | 0100011 | SH    |
| imm[11:5]    | rs2            | rs1   | 010 | imm[4:0]    | 0100011 | SW    |
| imm[11:0     | )]             | rs1   | 000 | rd          | 0010011 | ADDI  |
| imm[11:0     | )]             | rs1   | 010 | rd          | 0010011 | SLTI  |
| imm[11:0     | 0]             | rs1   | 011 | rd          | 0010011 | SLTIU |
| imm[11:0     | 0              | rs1   | 100 | rd          | 0010011 | XORI  |
| imm[11:0     | imm[11:0]      |       | 110 | rd          | 0010011 | ORI   |
| imm[11:0     | 0]             | rs1   | 111 | rd          | 0010011 | ANDI  |
| 0000000      | 1 .            | -     | 001 | 1           | 0010011 | OTTT  |

|         |        |                        | 1     |     | 1     | 1       |  |
|---------|--------|------------------------|-------|-----|-------|---------|--|
| 0000000 | )      | $\operatorname{shamt}$ | rs1   | 001 | rd    | 0010011 |  |
| 0000000 | )      | $\operatorname{shamt}$ | rs1   | 101 | rd    | 0010011 |  |
| 0100000 | )      | shamt                  | rs1   | 101 | rd    | 0010011 |  |
| 0000000 | )      | rs2                    | rs1   | 000 | rd    | 0110011 |  |
| 0100000 | )      | rs2                    | rs1   | 000 | rd    | 0110011 |  |
| 0000000 | )      | rs2                    | rs1   | 001 | rd    | 0110011 |  |
| 0000000 | )      | rs2                    | rs1   | 010 | rd    | 0110011 |  |
| 0000000 | )      | rs2                    | rs1   | 011 | rd    | 0110011 |  |
| 0000000 | )      | rs2                    | rs1   | 100 | rd    | 0110011 |  |
| 0000000 | )      | rs2                    | rs1   | 101 | rd    | 0110011 |  |
| 0100000 | )      | rs2                    | rs1   | 101 | rd    | 0110011 |  |
| 0000000 | )      | rs2                    | rs1   | 110 | rd    | 0110011 |  |
| 0000000 | )      | rs2                    | rs1   | 111 | rd    | 0110011 |  |
| 0000    | pred   | d succ                 | 00000 | 000 | 00000 | 0001111 |  |
| 0000    | 000    | 0 0000                 | 00000 | 001 | 00000 | 0001111 |  |
| 000     | 000000 | 000                    | 00000 | 000 | 00000 | 1110011 |  |
| 000     | 000000 | 001                    | 00000 | 000 | 00000 | 1110011 |  |
|         | csr    |                        | rs1   | 001 | rd    | 1110011 |  |
|         | csr    | lot in                 | This  | COU | rce   | 1110011 |  |
|         | csr    | <b>10</b> C 111        | rs1   | 011 | rd    | 1110011 |  |
|         | csr    |                        | zimm  | 101 | rd    | 1110011 |  |
|         | csr    |                        | zimm  | 110 | rd    | 1110011 |  |
|         | csr    |                        | zimm  | 111 | rd    | 1110011 |  |

SLLI SRLI SRAI ADD SUB SLL SLT SLTU XOR SRL SRA OR AND FENCE FENCE.I **ECALL EBREAK CSRRW** CSRRS CSRRC **CSRRWI CSRRSI** CSRRCI

#### State Required by RV32I ISA

Each instruction reads and updates this state during execution:

- Registers (x0..x31)
  - Register file (or regfile) Reg holds 32 registers x 32 bits/register: Reg[0].. Reg[31]
  - First register read specified by rs1 field in instruction
  - Second register read specified by rs2 field in instruction
  - Write register (destination) specified by rd field in instruction
  - x0 is always 0 (writes to Reg[0] are ignored)
- Program Counter (PC)
  - Holds address of current instruction
- Memory (MEM)
  - Holds both instructions & data, in one 32-bit byte-addressed memory space
  - We'll use separate memories for instructions (IMEM) and data (DMEM)
    - Later we'll replace these with instruction and data caches
  - Instructions are read (fetched) from instruction memory (assume IMEM read-only)
  - Load/store instructions access data memory

#### One-Instruction-Per-Cycle RISC-V Machine



- On every tick of the clock, the computer executes one instruction
- Current state outputs drive the inputs to the combinational logic, whose outputs settles at the values of the state before the next clock edge
- At the rising clock edge, all the state elements are updated with the combinational logic outputs, and execution moves to the next clock cycle

#### Basic Phases of Instruction Execution



#### Implementing the **add** instruction

|         |     |     |     |                     |         | -   |
|---------|-----|-----|-----|---------------------|---------|-----|
| 0000000 | rs2 | rs1 | 000 | $\operatorname{rd}$ | 0110011 | ADD |

```
add rd, rs1, rs2
```

Instruction makes two changes to machine's state:

```
-Reg[rd] = Reg[rs1] + Reg[rs2]
```

-PC = PC + 4

# Datapath for add



# Timing Diagram for add



#### Implementing the **sub** instruction

| 0000000 | rs2 | rs1 | 000 | $\operatorname{rd}$ | 0110011 |  |
|---------|-----|-----|-----|---------------------|---------|--|
| 0100000 | rs2 | rs1 | 000 | $\operatorname{rd}$ | 0110011 |  |

ADD

#### sub rd, rs1, rs2

- Almost the same as add, except now have to subtract operands instead of adding them
- inst[30] selects between add and subtract

#### Datapath for add/sub



#### Implementing other R-Format instructions

| 0000000 | rs2 | rs1 | 000 | $\operatorname{rd}$ | 0110011 | ] , |
|---------|-----|-----|-----|---------------------|---------|-----|
| 0100000 | rs2 | rs1 | 000 | rd                  | 0110011 | ] ; |
| 0000000 | rs2 | rs1 | 001 | $\operatorname{rd}$ | 0110011 | ] ; |
| 0000000 | rs2 | rs1 | 010 | rd                  | 0110011 | ] ; |
| 0000000 | rs2 | rs1 | 011 | $\operatorname{rd}$ | 0110011 | ] ; |
| 0000000 | rs2 | rs1 | 100 | rd                  | 0110011 | ] : |
| 0000000 | rs2 | rs1 | 101 | rd                  | 0110011 | ] ; |
| 0100000 | rs2 | rs1 | 101 | $\operatorname{rd}$ | 0110011 | ] ; |
| 0000000 | rs2 | rs1 | 110 | rd                  | 0110011 |     |
| 0000000 | rs2 | rs1 | 111 | rd                  | 0110011 | ] . |

SUB
SLL
SLT
SLTU
XOR
SRL
SRA
OR
AND

ADD

 All implemented by decoding funct3 and funct7 fields and selecting appropriate ALU function

#### Implementing the **addi** instruction

RISC-V Assembly Instruction:
 addi x15,x1,-50

| 31           | 20 19 | 15 14 1 | 2 11  | 7 6 0   |
|--------------|-------|---------|-------|---------|
| imm[11:0]    | rs1   | funct3  | rd    | opcode  |
| 12           | 5     | 3       | 5     | 7       |
|              |       |         |       |         |
| 111111001110 | 00001 | 000     | 01111 | 0010011 |
| imm=-50      | rs1=1 | ADD     | rd=15 | OP-Imm  |

4/10/2020

#### Datapath for add/sub



### Adding addi to datapath



#### I-Format immediates





imm[31:0]

- High 12 bits of instruction (inst[31:20]) copied to low 12 bits of immediate (imm[11:0])
- Immediate is sign-extended by copying value of inst[31] to fill the upper 20 bits of the immediate value (imm[31:12])

#### Adding addi to datapath



#### Implementing Load Word instruction

RISC-V Assembly Instruction:
 lw x14, 8(x2)

| 31 | 2         | 20 19 | 15 14 1 | 2 11 7 | 6 0    |
|----|-----------|-------|---------|--------|--------|
|    | imm[11:0] | rs1   | funct3  | rd     | opcode |
|    | 12        | 5     | 3       | 5      | 7      |

| 00000001000 | 00010 | 010 | 01110 | 0000011 |
|-------------|-------|-----|-------|---------|
| imm=+8      | rs1=2 | LW  | rd=14 | LOAD    |

4/10/2020

### Adding addi to datapath



# Adding **lw** to datapath



# Adding **lw** to datapath



#### All RV32 Load Instructions

| imm[11:0] | rs1 | 000 | rd | 0000011 | LB  |
|-----------|-----|-----|----|---------|-----|
| imm[11:0] | rs1 | 001 | rd | 0000011 | LH  |
| imm[11:0] | rs1 | 010 | rd | 0000011 | LW  |
| imm[11:0] | rs1 | 100 | rd | 0000011 | LBU |
| imm[11:0] | rs1 | 101 | rd | 0000011 | LHU |

funct3 field encodes size and signedness of load data

 Supporting the narrower loads requires additional circuits to extract the correct byte/halfword from the value loaded from memory, and sign- or zero-extend the result to 32 bits before writing back to register file.

#### Implementing Store Word instruction

RISC-V Assembly Instruction:
 sw x14, 8(x2)



# Adding **lw** to datapath



### Adding **sw** to datapath



# Adding **sw** to datapath



#### I-Format immediates





imm[31:0]

26

- High 12 bits of instruction (inst[31:20]) copied to low 12 bits of immediate (imm[11:0])
- Immediate is sign-extended by copying value of inst[31] to fill the upper 20 bits of the immediate value (imm[31:12])

#### I & S Immediate Generator

inst[31:0]



- Just need a 5-bit mux to select between two positions where low five bits of immediate can reside in instruction
- Other bits in immediate are wired to fixed positions in instruction

CS 61c

imm[31:0]

2

### Implementing Branches



- B-format is mostly same as S-Format, with two register sources (rs1/rs2) and a 12-bit immediate
- But now immediate represents values -4096 to +4094 in 2-byte increments
- The 12 immediate bits encode even 13-bit signed byte offsets (lowest bit of offset is always zero, so no need to store it)

### Adding **sw** to datapath



## Adding branches to datapath



### Adding branches to datapath



#### Branch Comparator



- BrEq = 1, if A=B
- BrLT = 1, if A < B
- BrUn =1 selects unsigned comparison for BrLT, 0=signed

• BGE branch: A >= B, if !(A<B)

# Multiply Branch Immediates by Shift?

- 12-bit immediate encodes PC-relative offset of -4096 to +4094 bytes in multiples of 2 bytes
- Standard approach: treat immediate as in range -2048..+2047, then shift left by 1 bit to multiply by 2 for branches



Each instruction immediate bit can appear in one of two places in output immediate value – so need one 2-way mux per bit

#### RISC-V Branch Immediates

- 12-bit immediate encodes PC-relative offset of -4096 to +4094 bytes in multiples of 2 bytes
- RISC-V approach: keep 11 immediate bits in fixed position in output value, and rotate LSB of S-format to be bit 12 of B-format



Only one bit changes position between S and B, so only need a single-bit 2-way mux

#### RISC-V Immediate Encoding

|  | <b>Instruction</b> | Encodings, | inst[31:0] |
|--|--------------------|------------|------------|
|--|--------------------|------------|------------|

|    | 31                                    | 30                 | 25               | 24        | 21  | 20      | 19  |                   | 15   | 14   | 12    | 11  | 8             |       | 7               | 6        | 0   |             |
|----|---------------------------------------|--------------------|------------------|-----------|-----|---------|-----|-------------------|------|------|-------|-----|---------------|-------|-----------------|----------|-----|-------------|
|    |                                       | funct7             |                  |           | rs2 |         |     | rs1               |      | func | et3   |     | ro            | f     |                 | opco     | de  | R-type      |
| i. |                                       |                    | 3-57             |           |     |         |     |                   |      |      |       |     |               |       |                 |          |     |             |
| [  |                                       |                    | imm[1            | 1:0]      |     |         |     | rs1               |      | func | et3   |     | r             | f     |                 | opco     | de  | I-type      |
|    |                                       |                    |                  |           |     |         |     |                   |      |      |       |     |               |       |                 |          |     |             |
|    | 2                                     | $\mathrm{imm}[11:$ | 5]               |           | rs2 |         |     | rs1               |      | func | et3   |     | imm           | [4:0] |                 | opco     | de  | S-type      |
|    | 2000                                  | 0.00 42            | 2000             | 9):<br>63 |     |         |     |                   | - 55 |      | 0)    |     | 10073 00000 0 |       | 30000 400 0     |          |     |             |
|    | imm[1                                 | [2] imn            | n[10:5]          |           | rs2 |         |     | rs1               |      | func | et3   | imn | n[4:1]        | imn   | n[11]           | opco     | de  | B-type      |
|    | 32-bit immediates produced, imm[31:0] |                    |                  |           |     |         |     |                   |      |      |       |     |               |       |                 |          |     |             |
| 3  | 1 :                                   | 30                 |                  | 20 19     |     | ]       | 2   | 11                | •    | 10   |       | 5   | 4             |       | 1               | 0        |     |             |
|    |                                       |                    | — in             | st[31]    |     |         |     |                   |      | inst | [30:2 | 25] | inst[         | 24:2  | 1] i            | nst[20]  | )]  | I-immediate |
|    |                                       |                    |                  |           |     |         |     |                   | (3)  |      |       | •   |               |       | 50 <sup>8</sup> |          |     |             |
|    |                                       |                    | — in             | st[31]    |     |         |     |                   |      | inst | [30:2 | 25] | inst          | [11:8 | 3] j            | inst[7]  |     | S-immediate |
|    |                                       |                    |                  |           |     |         | 28' |                   |      |      |       |     |               |       |                 | <b>→</b> |     |             |
|    |                                       | -                  | inst[31]         | .] —      |     |         | i   | $\mathrm{nst}[7]$ | 7]   | inst | [30:2 | 25] | inst          | [11:8 | 3]              | 0        |     | B-immediate |
|    |                                       |                    |                  |           |     |         | >   |                   |      | 0    | nly   | bit | 7 of i        | nstr  | uctic           | n cha    | ang | es role in  |
|    | and the first                         |                    | المراجع والمراجع | -I C      |     | [24] -[ |     |                   |      |      | •     |     | _             |       |                 | _        |     |             |

Upper bits sign-extended from inst[31] always

immediate between S and B

35

#### Implementing **JALR** Instruction (I-Format)

| 31 |              | 20 19 | 15 14 12 | 11                  | 7 6    | 0 |
|----|--------------|-------|----------|---------------------|--------|---|
|    | imm[11:0]    | rs1   | funct3   | $\operatorname{rd}$ | opcode |   |
|    | 12           | 5     | 3        | 5                   | 7      |   |
|    | offset[11:0] | base  | 0        | dest                | JALR   |   |

- JALR rd, rs, immediate
  - Writes PC+4 to Reg[rd] (return address)
  - Sets PC = Reg[rs1] + immediate
  - Uses same immediates as arithmetic and loads
    - no multiplication by 2 bytes

## Adding branches to datapath



### Adding **jalr** to datapath



### Adding **jalr** to datapath



## Implementing **jal** Instruction



- JAL saves PC+4 in Reg[rd] (the return address)
- Set PC = PC + offset (PC-relative jump)
- Target somewhere within ±2<sup>19</sup> locations, 2 bytes apart
  - ±2<sup>18</sup> 32-bit instructions
- Immediate encoding optimized similarly to branch instruction to reduce hardware cost

# Adding **jal** to datapath



### Adding **jal** to datapath



# Single-Cycle RISC-V RV32I Datapath



#### And in Conclusion, ...

- Universal datapath
  - Capable of executing all RISC-V instructions in one cycle each
  - Not all units (hardware) used by all instructions
- 5 Phases of execution
  - IF, ID, EX, MEM, WB
  - Not all instructions are active in all phases
- Controller specifies how to execute instructions
  - what new instructions can be added with just most control?