

### Lecture #11

# The Processor: Datapath



## Lecture #11: Processor: Datapath (1/2)

- 1. Building a Processor: Datapath & Control
- 2. MIPS Processor: Implementation
- 3. Instruction Execution Cycle (Recap)
- 4. MIPS Instruction Execution
- 5. Let's Build a MIPS Processor
  - 5.1 Fetch Stage
  - 5.2 Decode Stage
  - 5.3 ALU Stage
  - 5.4 Memory Stage
  - 5.5 Register Write Stage

## Lecture #11: Processor: Datapath (2/2)

- 6. The Complete Datapath!
- 7. Brief Recap
- 8. From C to Execution
  - 8.1 Writing C program
  - 8.2 Compiling to MIPS
  - 8.3 Assembling to Binaries
  - 8.4 Execution (Datapath)

### 1. Building a Processor: Datapath & Control

Two major components for a processor

### Datapath

- Collection of components that process data
- Performs the arithmetic, logical and memory operations

### Control

 Tells the datapath, memory and I/O devices what to do according to program instructions

## 2. MIPS Processor: Implementation

- Simplest possible implementation of a subset of the core MIPS ISA:
  - Arithmetic and Logical operations
    - add, sub, and, or, addi, andi, ori, slt
  - Data transfer instructions
    - lw, sw
  - Branches
    - beq, bne
- Shift instructions (s11, sr1) and J-type instructions (j) will not be discussed:
  - Left as exercises ©

# 3. Instruction Execution Cycle (Basic)



#### 1. Fetch:

- Get instruction from memory
- Address is in Program Counter (PC) Register

#### 2. Decode:

Find out the operation required

### 3. Operand Fetch:

Get operand(s) needed for operation

#### 4. Execute:

Perform the required operation

### 5. Result Write (Store):

Store the result of the operation

# 4. MIPS Instruction Execution (1/2)

- Show the actual steps for 3 representative MIPS instructions
- Fetch and Decode stages not shown:
  - The standard steps are performed

|                  | add <b>\$3</b> , <b>\$1</b> , <b>\$2</b>                        | lw \$3, 20(\$1)                                                                   | beq \$1, \$2, ofst                                                        |
|------------------|-----------------------------------------------------------------|-----------------------------------------------------------------------------------|---------------------------------------------------------------------------|
| Fetch            | standard                                                        | standard                                                                          | standard                                                                  |
| Decode           |                                                                 |                                                                                   |                                                                           |
| Operand<br>Fetch | <ul><li>Read [\$1] as opr1</li><li>Read [\$2] as opr2</li></ul> | <ul><li>Read [\$1] as opr1</li><li>Use 20 as opr2</li></ul>                       | <ul><li>Read [\$1] as opr1</li><li>Read [\$2] as opr2</li></ul>           |
| Execute          | Result = opr1 + opr2                                            | <ul><li> MemAddr = opr1 + opr2</li><li> Use MemAddr to read from memory</li></ul> | Taken = (opr1 == opr2)? Target = ( <b>PC</b> +4) + <b>ofst</b> × <b>4</b> |
| Result<br>Write  | Result stored in \$3                                            | Memory data stored in \$3                                                         | if ( <i>Taken</i> ) <b>PC</b> = <i>Target</i>                             |

- opr = operand
- MemAddr = Memory Address
- ofst = offset

## 4. MIPS Instruction Execution (2/2)

- Design changes:
  - Merge Decode and Operand Fetch Decode is simple for MIPS
  - Split Execute into ALU (Calculation) and Memory Access

|                        | add \$3, \$1, \$2                                                   | lw \$3, 20(\$1)                                                 | beq \$1, \$2, ofst                                                            |
|------------------------|---------------------------------------------------------------------|-----------------------------------------------------------------|-------------------------------------------------------------------------------|
| Fetch                  | Read inst. at [PC]                                                  | Read inst. at [PC]                                              | Read inst. at [PC]                                                            |
| Decode & Operand Fetch | <ul><li>○ Read [\$1] as opr1</li><li>○ Read [\$2] as opr2</li></ul> | <ul><li>○ Read [\$1] as opr1</li><li>○ Use 20 as opr2</li></ul> | <ul><li>○ Read [\$1] as opr1</li><li>○ Read [\$2] as opr2</li></ul>           |
| ALU                    | Result = opr1 + opr2                                                | MemAddr = opr1 + opr2                                           | Taken = (opr1 == opr2 )?<br>Target = ( <b>PC</b> +4) + <b>ofst</b> × <b>4</b> |
| Memory<br>Access       |                                                                     | Use <i>MemAddr</i> to read from memory                          |                                                                               |
| Result<br>Write        | Result stored in \$3                                                | Memory data stored in \$3                                       | if ( <i>Taken</i> ) <b>PC</b> = <i>Target</i>                                 |

### 5. Let's Build a MIPS Processor

- What we are going to do:
  - Look at each stage closely, figure out the requirements and processes
  - Sketch a high level block diagram, then zoom in for each elements
  - With the simple starting design, check whether different type of instructions can be handled:
    - Add modifications when needed
- → Study the design from the viewpoint of a designer, instead of a "tourist" ②

# 5.1 Fetch Stage: Requirements

- 1. Fetch
- 2. Decode
- 3. ALU
- 4. Memory
- 5. RegWrite

- Instruction Fetch Stage:
  - 1. Use the Program Counter (PC) to fetch the instruction from memory
    - PC is implemented as a special register in the processor
  - 2. Increment the PC by 4 to get the address of the next instruction:
    - How do we know the next instruction is at PC+4?
    - Note the exception when branch/jump instruction is executed
- Output to the next stage (Decode):
  - The instruction to be executed

# 5.1 Fetch Stage: Block Diagram



## 5.1 Element: Instruction Memory

- Storage element for the instructions
  - It is a sequential circuit (to be covered later)
  - Has an internal state that stores information
  - Clock signal is assumed and not shown



- Supply instruction given the address
  - Given instruction address M as input, the memory outputs the content at address M
  - Conceptual diagram of the memory layout is given on the right →



### 5.1 Element: Adder

 Combinational logic to implement the addition of two numbers

### Inputs:

Two 32-bit numbers A, B

### Output:

Sum of the input numbers, A + B



### 5.1 The Idea of Clocking

- It seems that we are reading and updating the PC at the same time:
  - How can it works properly?

### Magic of clock:

PC is read during the first half of the clock period and it is updated with PC+4 at the next rising clock edge



# 5.2 Decode Stage: Requirements

- 1. Fetch
- 2. Decode
- 3. ALU
- 4. Memory
- 5. RegWrite

- Instruction Decode Stage:
  - Gather data from the instruction fields:
    - Read the **opcode** to determine instruction type and field lengths
    - 2. Read data from all necessary registers
      - Can be two (e.g. add), one (e.g. addi) or zero (e.g. j)
- Input from previous stage (Fetch):
  - Instruction to be executed
- Output to the next stage (ALU):
  - Operation and the necessary operands

### 5.2 Decode Stage: Block Diagram



### 5.2 Element: Register File

- A collection of 32 registers:
  - Each 32-bit wide; can be read/written by specifying register number
  - Read at most two registers per instruction
  - Write at most one register per instruction
- RegWrite is a control signal to indicate:
  - Writing of register
  - 1(True) = Write, 0 (False) = No Write



# 5.2 Decode Stage: R-Format Instruction



# 5.2 Decode Stage: I-Format Instruction



### 5.2 Decode Stage: Choice in Destination



## 5.2 Multiplexer

#### Function:

Selects one input from multiple input lines

### Inputs:

n lines of same width

#### Control:

• **m** bits where  $n = 2^m$ 

### Output:

Select ith input line if control = i



Control=0  $\rightarrow$  select in<sub>0</sub> to out Control=3  $\rightarrow$  select in<sub>3</sub> to out

# 5.2 Decode Stage: Choice in Data 2



### 5.2 Decode Stage: Load Word Instruction



### 5.2 Decode Stage: Branch Instruction



## 5.2 **Decode Stage**: Summary



# 5.3 ALU Stage: Requirements

- Instruction ALU Stage:
  - ALU = Arithmetic-Logic Unit
  - Also called the Execution stage
  - Perform the real work for most instructions here
    - Arithmetic (e.g. add, sub), Shifting (e.g. s11), Logical (e.g. and, or)
    - Memory operation (e.g. 1w, sw): Address calculation
    - Branch operation (e.g. bne, beq): Perform register comparison and target address calculation
- Input from previous stage (Decode):
  - Operation and Operand(s)
- Output to the next stage (Memory):
  - Calculation result

- 1. Fetch
- 2. Decode
- 3. ALU
- 4. Memory
- 5. RegWrite

## 5.3 ALU Stage: Block Diagram



### 5.3 Element: Arithmetic Logic Unit

### ALU (Arithmetic Logic Unit)

 Combinational logic to implement arithmetic and logical operations

### Inputs:

Two 32-bit numbers

#### Control:

4-bit to decide the particular operation

### Outputs:

- Result of arithmetic/logical operation
- A 1-bit signal to indicate whether result is zero



| ALUcontrol | Function |
|------------|----------|
| 0000       | AND      |
| 0001       | OR       |
| 0010       | add      |
| 0110       | subtract |
| 0111       | slt      |
| 1100       | NOR      |

### 5.3 ALU Stage: Non-Branch Instructions

We can handle non-branch instructions easily:



### 5.3 ALU Stage: Branch Instructions

- Branch instruction is harder as we need to perform two calculations:
- Example: "beq \$9, \$0, 3"

#### 1. Branch Outcome:

- Use ALU to compare the register
- The 1-bit "isZero?" signal is enough to handle equal/not equal check (how?)

### 2. Branch Target Address:

- Introduce additional logic to calculate the address
- Need PC (from Fetch Stage)
- Need **Offset** (from Decode Stage)

### **Complete ALU Stage**



# 5.4 Memory Stage: Requirement

- 1. Fetch
- 2. Decode
- 3. ALU
- 4. Memory
- 5. RegWrite

- Instruction Memory Access Stage:
  - Only the load and store instructions need to perform operation in this stage:
    - Use memory address calculated by ALU Stage
    - Read from or write to data memory
  - All other instructions remain idle
    - Result from ALU Stage will pass through to be used in Register Write stage (see section 5.5) if applicable
- Input from previous stage (ALU):
  - Computation result to be used as memory address (if applicable)
- Output to the next stage (Register Write):
  - Result to be stored (if applicable)

## 5.4 Memory Stage: Block Diagram



### 5.4 Element: Data Memory

Storage element for the data of a program

### Inputs:

- Memory Address
- Data to be written (Write Data) for store instructions

#### Control:

 Read and Write controls; only one can be asserted at any point of time

### Output:

Data read from memory (Read Data) for load instructions



# 5.4 Memory Stage: Load Instruction

Only relevant parts of Decode and ALU Stages are shown



# 5.4 Memory Stage: Store Instruction

Need Read Data 2 (from Decode stage) as the Write Data



# 5.4 Memory Stage: Non-Memory Inst.

Add a multiplexer to choose the result to be stored



## 5.5 Register Write Stage: Requireme

- 1. Fetch
- 2. Decode
- 3. ALU
- 4. Memory
- 5. RegWrite

- Instruction Register Write Stage:
  - Most instructions write the result of some computation into a register
    - Examples: arithmetic, logical, shifts, loads, set-less-than
    - Need destination register number and computation result
  - Exceptions are stores, branches, jumps:
    - There are no results to be written
    - These instructions remain idle in this stage
- Input from previous stage (Memory):
  - Computation result either from memory or ALU

## 5.5 Register Write Stage: Block Diagram



- Result Write stage has no additional element:
  - Basically just route the correct result into register file
  - The Write Register number is generated way back in the Decode Stage

# 5.5 Register Write Stage: Routing

add \$8, \$9, \$10



# 6. The Complete Datapath!

- We have just finished "designing" the datapath for a subset of MIPS instructions:
  - Shifting and Jump are not supported
- Check your understanding:
  - Take the complete datapath and play the role of controller:
    - See how supported instructions are executed
    - Figure out the correct control signals for the datapath elements
- Coming up next lecture: Control



## 7. Brief Recap (1/4)

Lecture #7, Slide 4



0000001111100000000000000000001000

Write program in high-level language (e.g., C)

```
if(x != 0) {
  a[0] = a[1] + x;
```

## 7. Brief Recap (2/4)

### Lecture #7, Slide 4



Compiler translates to assembly language (e.g., MIPS)

```
beq $16, $0, Else
      lw $8, 4($17)
      add $8, $8, $16
      sw $8, 0($17)
Else:
```

## 7. Brief Recap (3/4)

### Lecture #7, Slide 4



Assembler translates to machine code (i.e., binaries)

```
0001 0010 0000 0000
0000 0000 0000 0011
1000 1110 0010 1000
0000 0000 0000 0100
0000 0010 0000 1000
0100 0000 0001 0100
1010 1110 0010 1000
0000 0000 0000 0000
```

## 7. Brief Recap (4/4)

### Lecture #7, Slide 4



0000001111100000000000000000001000

Processor executes the machine code (i.e., binaries)



### 8. From C to Execution

- We play the role of Programmer, Compiler, Assembler, and Processor
  - Program:

```
if(x != 0) {
  a[0] = a[1] + x;
}
```

- Programmer:
  - Show the workflow of compiling, assembling, and executing C program
- Compiler:
  - Show how the program is compiled into MIPS
- Assembler:
  - Show how the MIPS is translated into binaries
- Processor:
  - Show how the datapath is activated in the processor

## 8.1 Writing C Program

Edit, Compile, Execute: Lecture #2, Slide 5



## 8.2 Compiling to MIPS (1/7)

recap.c if(x != 0) { a[0] = a[1] + x; }

Key Idea #1:

Compilation is a structured process

```
if(x != 0) {
  a[0] = a[1] + x;
}
```

Each structure can be compiled independently

#### **Inner Structure**

#### **Outer Structure**

```
if(x != 0) {
}
```

### 8.2 Compiling to MIPS (2/7)

Key Idea #2: Variable-to-Register Mapping

```
if(x != 0) {
  a[0] = a[1] + x;
}
```

Let the mapping be:

| Variable | Register Name | Register Number |
|----------|---------------|-----------------|
| X        | \$s0          | \$16            |
| а        | \$s1          | \$17            |

```
recap.c

if(x != 0) {

a[0] =

a[1] + x;

}
```

## 8.2 Compiling to MIPS (3/7)

Mapping: r

a: \$17

recap.c if(x != 0) { a[0] = a[1] + x;

 Common Technique #1:
 Invert the condition for shorter code (Lecture #8, Slide 22)

#### **Outer Structure**

#### **Outer MIPS Code**

```
if(x != 0) {
}
```

```
beq $16, $0, Else

# Inner Structure

Else:
```

## 8.2 Compiling to MIPS (4/7)

Mapping:

x: \$16 a: \$17 \$t1: \$8

```
recap.c

if(x != 0) {

a[0] =

a[1] + x;

}
```

Common Technique #2:
 Break complex operations, use temp register (Lecture #7, Slide 29)

#### **Inner Structure**

### **Simplified Inner Structure**

```
a[0] = a[1] + x;
```

```
$t1 = a[1];
$t1 = $t1 + x;
a[0] = $t1;
```

## 8.2 Compiling to MIPS (5/7)

Mapping:

x: \$16 a: \$17 \$t1: \$8 recap.c if(x != 0) { a[0] = a[1] + x; }

Common Technique #3:
 Array access is 1w, array update is sw
 (Lecture #8, Slide 13)

### **Simplified Inner Structure**

### **Inner MIPS Code**

```
$t1 = a[1];
$t1 = $t1 + x;
a[0] = $t1;
```

```
lw $8, 4($17)
add $8, $8, $16
sw $8, 0($17)
```

## 8.2 Compiling to MIPS (6/7)

### Mapping:

x: \$16 a: \$17 \$t1: \$8

```
recap.c

if(x != 0) {

a[0] =

a[1] + x;
```

Common Error:

Assume that the address of the next word can be found by incrementing the address in a register by 1 instead of by the word size in bytes

### Example:

```
$\fint$ $\fint$ = a[1];
is translated to
lw $\fint$8, 4($17)
instead of
lw $\fint$8, 1($17)
```

## 8.2 Compiling to MIPS (7/7)

### Mapping:

x: \$16 a: \$17 \$t1: \$8

```
recap.c

if(x != 0) {

a[0] =

a[1] + x;

}
```

### Last Step:

Combine the two structures logically

#### **Inner MIPS Code**

# lw \$8, 4(\$17) add \$8, \$8, \$16

sw \$8, 0(\$17)

#### **Outer MIPS Code**

```
beq $16, $0, Else

# Inner Structure

Else:
```

#### **Combined MIPS Code**

```
beq $16, $0, Else
lw $8, 4($17)
add $8, $8, $16
sw $8, 0($17)
Else:
```

## 8.3 Assembling to Binary (1/6)

### recap.mips

beq \$16, \$0, Else lw \$8, 4(\$17) add \$8, \$8, \$16 sw \$8, 0(\$17)

Else:

### Instruction Types Used:

- 1. R-Format: (Lecture #9, Slide 8)
  - opcode \$rd, \$rs, \$rt

| 6      | 5  | 5  | 5  | 5     | 6     |
|--------|----|----|----|-------|-------|
| opcode | rs | rt | rd | shamt | funct |

- 2. I-Format: (Lecture #9, Slide 14)
  - opcode \$rt, \$rs, immediate

| 6      | 5  | 5  | 16        |
|--------|----|----|-----------|
| opcode | rs | rt | immediate |

- 3. Branch: (Lecture #9, Slide 22)
  - Uses I-Format
  - PC = (PC + 4) + (immediate × 4)

### recap.mips

beq \$16, \$0, Else lw \$8, 4(\$17) add \$8, \$8, \$16 sw \$8, 0(\$17)

Else:

## 8.3 Assembling to Binary (2/6)

- beq \$16, \$0, Else
  - Compute immediate value (Lecture #9, Slide 27)
    - immediate = 3
  - Fill in fields (refer to MIPS Reference Data)

```
    6
    5
    5
    16

    4
    16
    0
    3
```

Convert to binary

000100 10000 00000 00000000000011

```
beq $16, $0, Else

lw $8, 4($17)

add $8, $8, $16

sw $8, 0($17)

Else:
```

### recap.mips

beq \$16, \$0, Else lw \$8, 4(\$17) add \$8, \$8, \$16 sw \$8, 0(\$17)

Else:

## 8.3 Assembling to Binary (3/6)

- lw \$8, 4(\$17)
  - Fill in fields (refer to MIPS Reference Data)

|   | 6                                   | 5` | 5 | 16 |  |
|---|-------------------------------------|----|---|----|--|
|   | 35                                  | 17 | 8 | 4  |  |
| • | <ul><li>Convert to binary</li></ul> |    |   |    |  |

```
0001 0010 0000 0000 0000 0000 0000 0011

lw $8, 4($17)

add $8, $8, $16

sw $8, 0($17)

Else:
```

### recap.mips

beq \$16, \$0, Else lw \$8, 4(\$17) add \$8, \$8, \$16 sw \$8, 0(\$17)

Else:

## 8.3 Assembling to Binary (4/6)

- add \$8, \$8, \$16
  - Fill in fields (refer to MIPS Reference Data)

| 6 | 5` | 5  | 5 | <u> </u> | 6  |
|---|----|----|---|----------|----|
| 0 | 8  | 16 | 8 | 0        | 32 |

Convert to binary

| 000000 | 01000 | 10000 | 01000 | 00000 | 100000 |
|--------|-------|-------|-------|-------|--------|
|        |       |       |       |       |        |

```
0001 0010 0000 0000 0000 0000 0000 0011
1000 1110 0010 1000 0000 0000 0000 0100
add $8, $8, $16
sw $8, 0($17)
Else:
```

# 8.3 Assembling to Binary (5/6)

recap.mips

beq \$16, \$0, Else lw \$8, 4(\$17) add \$8, \$8, \$16 sw \$8, 0(\$17)

Else:

```
■ sw $8, 0($17)
```

Fill in fields (refer to MIPS Reference Data)

| 6     | 5`          | 5 | 16 ´ |
|-------|-------------|---|------|
| 43    | 17          | 8 | 0    |
| Conve | rt to binar | У |      |
|       |             |   |      |

| 101011 | 10001 | 01000 | 00000000000000 |
|--------|-------|-------|----------------|
| LOTOTT | TOOOT | 01000 | 00000000000000 |

```
0001 0010 0000 0000 0000 0000 0000 0011 1000 1110 0010 1000 0000 0000 0000 0100 0000 0000 0000 0000 sw $8, 0($17) Else:
```

## 8.3 Assembling to Binary (6/6)

### Final Binary

- Hard to read?
- Don't worry, this is intended for machine not for human!

#### recap.mips

beq \$16, \$0, Else lw \$8, 4(\$17) add \$8, \$8, \$16 sw \$8, 0(\$17)

Else:

```
      0001
      0010
      0000
      0000
      0000
      0000
      0000
      0011

      1000
      1110
      0010
      1000
      0000
      0000
      0000
      0000
      0100

      0000
      0001
      0001
      0000
      0000
      0000
      0000
      0000
      0000

      1010
      1110
      0010
      1000
      0000
      0000
      0000
      0000
      0000
```

### 8.4 Execution (Datapath)

### Given the binary

Assume two possible executions:

```
1. $16 == $0 (shorter)
2. $16 != $0 (Longer)
```

Convention:

```
Fetch: Memory:

Decode: Reg Write: Other:
```

```
0010
          0000
               0000
                    0000
                          0000
                                0000
          0010
               1000
                     0000
                          0000
1000
     1110
                                0000
                                     0100
     0001 0001
               0000
                     0100
                          0000
                                0010
0000
                                     0000
     1110 0010 1000 0000 0000 0000 0000
```











# Reading

- The Processor: Datapath and Control
  - COD Chapter 5 Sections 5.1 5.3 (3<sup>rd</sup> edition)
  - COD Chapter 4 Sections 4.1 4.3 (4<sup>th</sup> edition)



# End of File