#### Announcements

- Lab 7 meets Fri / Mon
- P2
  - part L is due Thursday next week
- Homework 3 Due Mon 11/6

#### **Pipelining**

- Want to execute an instruction?
  - Build a processor (multi-cycle)
  - Find instructions
  - Line up instructions (1, 2, 3, ...)
  - Overlap execution
    - Cycle #1: Fetch 1
    - Cycle #2: Decode 1 Fetch 2
       Cycle #3: ALU 1 Decode 2
    - Cycle #3: ALU 1
  - This is called pipelining instruction execution.
  - Used extensively for the first time on IBM 360 (1960s).
  - CPI approaches 1.



2



Fetch 3

# Sample Code (Simple)

Let's run the following code on pipelined LC2K:

```
    add 1 2 3 ; reg 3 = reg 1 + reg 2
    nor 4 5 6 ; reg 6 = reg 4 nor reg 5
    lw 2 4 20 ; reg 4 = Mem[reg2+20]
    add 2 5 5 ; reg 5 = reg 2 + reg 5
    sw 3 7 10 ; Mem[reg3+10] = reg 7
```

Time 2 - Fetch: nor 4 5 6

IF/ID

nor 4 5 6



add 123



ID/EX

add 123

EX/Mem

Mem/WB







# Time 8 – no more instructions | Cothy | Cothy

#### Pipelining - What can go wrong?

- Data hazards: since register reads occur in stage 2 and register writes occur in stage 5 it is possible to read an old / stale value before the correct value is written back.
- Control hazards: A branch instruction may change the PC, but not until stage 4. What do we fetch before that?
- Exceptions: Sometimes we need to pause execution, switch to another task (maybe the OS), and then resume execution... how to we make sure we resume at the right snot
- Now Data hazards
  - · What are they?
  - How do you detect them?
  - How do you deal with them?



### Pipeline function for ADD

- Fetch: read instruction from memory
- Decode: read source operands from reg
- Execute: calculate sum
- Memory: pass results to next stage
- Writeback: write sum into register file

#### Data Hazards



If not careful, nor will read a stale value of register 3





#### Data Hazards



Assume Register File gives the right value of register 3 when read/written during <a href="mailto:same">same</a> cycle. This is consistent with most processors (ARM/x86), <a href="mailto:but only 10 pt only 10

#### **Definitions**

- Data Dependency: one instruction uses the result of a previous one
   Doesn't necessarily cause a problem
- Data Hazard: one instruction has a data dependency that will cause a problem if we don't "deal with it"

#### Class Problem 1

<u>Poll</u>; Which of these instructions has a data dependency on an earlier one? Which of those are data hazards in our 5-stage pipeline?

- 1. add 1 2 3
- 2. nor 3 4 5
- 3. add 6 3 7
- 4. lw 3 6 10
- 5. sw 6 2 12

#### What about here?

- add 1 2 3
- 2. beq 3 4 1
- 3. add 3 5 6
- 4. add 3 6 7

#### Class Problem 1

# Which read-after-write (RAW) dependences do you see?

Which of those are data hazards?

- 1. add 1 2 3
- 2. nor 3 4/5
- 3. add 6/3 7
- 4. lw 3 6 10
- 5. sw 6 2 12

#### What about here?

- ı. add 1 2 3
- 2. beq 3 4/1
- 3. add 3/5 6
- 4. add 3 6 7





## Three approaches to handling data hazards

- Avoid
  - Make sure there are no hazards in the code
- Detect and Stall
  - If hazards exist, stall the processor until they go away.
- Detect and Forward
  - If hazards exist, fix up the pipeline to get the correct value (if possible)

#### Handling data hazards I: Avoid all hazards

- Assume the programmer (or the compiler) knows about the processor implementation.
  - Make sure no hazards exist.
    - Put noops between any dependent instructions.

add 1 2 <u>3</u> write <u>register 3</u> in cycle 5 noop noop read <u>register 3</u> in cycle 5





#### Problems with this solution

- Old programs (legacy code) may not run correctly on new implementations
  - Longer pipelines need more noops
- Programs get larger as noops are included
  - Especially a problem for machines that try to execute more than one instruction every cycle
  - Intel EPIC: Often 25% 40% of instructions are noops
- Program execution is slower
  - CPI is 1, but some instructions are noops

# Handling data hazards II: Detect and stall until ready

- Detect:
  - Compare regA with previous DestRegs
    - 3 bit operand fields
  - Compare regB with previous DestRegs
    - 3 bit operand fields
- Stall:
  - · Keep current instructions in fetch and decode
  - · Pass a noop to execute
- How do we modify the pipeline to do this?











#### Example

- Let's run this program with a data hazard through our 5-stage pipeline add 1 2  $\frac{3}{5}$  nor  $\frac{3}{5}$  4  $\frac{5}{5}$
- We will start at the beginning of cycle 3, where add is in the EX stage, and nor is in the ID stage, about to read a register value













# Time Graph

| Time:     | 1  | 2  | 3   | 4   | 5  | 6  | 7  | 8  | 9 | 10 | 11 | 12 | 13 |
|-----------|----|----|-----|-----|----|----|----|----|---|----|----|----|----|
| add 1 2 3 | IF | ID | EX  | ME  | WB |    |    |    |   |    |    |    |    |
| nor 3 4 5 |    | IF | ID* | ID* | ID | EX | ME | WB |   |    |    |    |    |
| add 6 3 7 |    |    |     |     |    |    |    |    |   |    |    |    |    |
| lw 3 6 10 |    |    |     |     |    |    |    |    |   |    |    |    |    |
| sw 6 2 12 |    |    |     |     |    |    |    |    |   |    |    |    |    |

#### Solution

Larger programs Slower programs 10 11 12 13 ID EX ME add 1 2 3 nor 3 4 5 ID\* ID EX ME WB add 6 3 7 ID EX ME lw 3 6 10 IF ID EX ME sw 6 2 12 ME

Poll: Which problems does "detect and stall" fix over "avoid hazards"? (select all)

1. Breaking backwards compatibility