

Seongil Wi



#### HW2

2

Due date: 11/26, 11:59PM

#### Quiz 2



This is the last quiz of this semester

• Date: 11/26 (TUE.), Class time

- Scope:
  - Logic Design Basics
  - Processor (1) ~ Processor (6)
- Brint your own pen!
- T/F problems + 2~4 computation problems

#### Where are We?



Situations that prevent starting the next instruction in the next cycle

#### Hazard #1: Structural hazard

Conflict for use of a hardware resource

#### Solution:

- Stall
- Resource duplication

## Hazard #2: Data hazard

An instruction cannot execute because data is not yet available

#### **Solution:**

- Stall
- Forwarding
- Compiler optimization

#### Hazard #3: Control hazard

## Recap: Structural Hazard

The **hardware** cannot support the combination of instructions



#### **Recap: Pipeline Stall**





## Recap: Resource Duplication





#### Recap: Data Hazard



A planned instruction cannot execute because data is not yet available



The data of \$s0 calculated in the previous instruction is needed in the 3rd clock cycle

#### **Recap: Pipeline Stall**



## A delay in execution (2 clock cycle)



#### 1

### **Recap: Forwarding**

\*

Use result when it is computed!



### Recap: Compiler Optimization

- Reorder code to avoid use of load result in the next instruction
- C code for v[3] = v[0] + v[1]; v[4] = v[0] + v[2];

```
lw $t1, 0($t0)
lw $t2, 4($t0)
1 stall add $t3, $t1; $t2
sw $t3, 12($t0)
lw $t4 8($t0)
1 stall add $t5, $t1; $t4
sw $t5, 16($t0)
```

Idea: code reordering to avoid stalls (by compiler)

Compiler requires knowledge of the pipeline structure!

13 cycles





Situations that prevent starting the next instruction in the next cycle

#### Hazard #1: Structural hazard

Conflict for use of a hardware resource

#### Solution:

- Stall
- Resource duplication

## Hazard #2: Data hazard

An instruction cannot execute because data is not yet available

#### Hazard #3: Control hazard

#### Solution:

- Stall
- Forwarding
- Compiler optimization

Let's look at the hardware details

#### **Data Hazard**



A planned instruction cannot execute because data is not yet available



The data of \$s0 is needed at this time

## Data Hazard #1: EX Hazard

A planned instruction cannot execute because data is not yet available



### Data Hazard #2: MEM Hazard

A planned instruction cannot execute because data is not yet available



### **Solution: Forwarding**

- \*

Use result when it is computed!



### **Today's Topic**



- \*
- How should the hardware be modified to support forwarding?
- How should the hardware be modified to detect data hazards?
- How should the hardware be modified to support stalls?

# Forwarding Paths Overview



## Forwarding Paths Overview



## Forwarding Paths: Details



#### 21

## Forwarding Paths: Details









How to detecting the need to forward?





ID/EX.RegisterRs ↓

ID/EX.RegisterRt

EX/MEM.RegisterRd | MEM/WB.RegisterRd

EX hazard detection! First operand ← EX/MEM.RegisterRd



EX hazard detection! Second operand ← EX/MEM.RegisterRd



MEM hazard detection! First operand ← MEM/WB.RegisterRd



MEM hazard detection! Second operand ← MEM/WB.RegisterRd



#### **Additional Conditions**





- But only if forwarding instruction will write to a register!
  - -EX/MEM.RegWrite = 1
  - -MEM/WB.RegWrite = 1
- And only if rd for that instruction is not \$zero
  - -EX/MEM.RegisterRd ≠ 0
  - -MEM/WB.RegisterRd ≠ 0

30

#### Detecting the Need to Forward: Details



ID/EX.RegisterRs

#### Detecting the Need to Forward: Details











#### Detecting the Need to Forward: Details





## **Summary: Forwarding Conditions**

- EX hazard: Forward <u>EX/MEM value</u> to EX stage
  - if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10
  - if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10
- MEM hazard: Forward <u>MEM/WB value</u> to EX stage
  - if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01
  - if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

#### 35

#### **Double Data Hazard**



add \$1, \$2, \$3 add \$1, \$4, \$5 add \$6, \$1, \$7



Any problem?

#### **Double Data Hazard**



#### **Double Data Hazard: Problem**





#### **Double Data Hazard: Problem**



#### **Double Data Hazard: Problem**



## Revise the MEM Hazard Condition

- Only forward <u>if EX hazard condition isn't true</u>
  - if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)

and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)

and (EX/MEM.RegisterRd = ID/EX.RegisterRs))

and (MEM/WB.RegisterRd = ID/EX.RegisterRs))

ForwardA = 01

- if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)

and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)

and (EX/MEM.RegisterRd = ID/EX.RegisterRt))

and (MEM/WB.RegisterRd = ID/EX.RegisterRt))

ForwardB = 01





## Recall: Three Types of Data Hazard

1. EX Hazard

2. MEM Hazard

3. Load-Use Hazard

## Load-Use Data Hazard with Forwarding



forwarding when a load tries to use the data

### Load-Use Data Hazard Detection: Basic Idea

Load-Use hazard detection!

→ Insert bubble for one clock cycle



### Load-Use Data Hazard Detection: Basic Idea

Load-Use hazard detection!

→ Insert bubble for one clock cycle



#### **Additional Condition**



- But only if the instruction the ID/EX stage is a load instruction
  - -ID/EX.MemRead = 1

## Load-Use Data Hazard Detection: Basic Idea

```
if ID/EX.MemRead and
    ((ID/EX.RegisterRt = IF/ID.RegisterRs) or
    (ID/EX.RegisterRt = IF/ID.RegisterRt))
Insert bubble for one clock cycle (Stall)
```

### Load-Use Data Hazard Detection: Details 48



### Load-Use Data Hazard Detection: Details 49



## Load-Use Data Hazard Detection: Basic Idea

if ID/EX.MemRead and
 ((ID/EX.RegisterRt = IF/ID.RegisterRs) or
 (ID/EX.RegisterRt = IF/ID.RegisterRt))

Insert bubble for one clock cycle (Stall)

How to stall the pipeline?

## How to Stall the Pipeline?

1) Prevent update of PC and IF/ID register

2) Force control values in ID/EX register to 0

## **Stall Details: Current Cycle**



## 1) Prevent Update of PC and IF/ID Register



## 2) Force Control Values in ID/EX Register to 0



## Stall Details: Next Cycle (Stall Inserted)



## Stall Details: Next Next Cycle...



## Summary: Stall the Pipeline

- 1) Prevent update of PC and IF/ID register
  - Same instruction is decoded again
  - Same instruction (with same PC) is fetched again
- 2) Force control values in ID/EX register to 0
  - -EX, MEM and WB do nop (no-operation)

## Pipelining Hazards Summary

Situations that prevent starting the next instruction in the next cycle

#### Hazard #1: Structural hazard

Conflict for use of a hardware resource

#### Solution:

- Stall
- Resource duplication

## Hazard #2: Data hazard

An instruction cannot execute because data is not yet available

#### Solution:

- Stall
- Forwarding
- Compiler optimization

#### Hazard #3: Control hazard

## **Control Hazard**

## Recall: Branch Outcome Determined in MEM



#### **Control Hazard**





- Branch determines flow of control
  - -Fetching next instruction depends on branch outcome
  - -However, the branch decision is made in the MEM stage



```
beq $1, $3, 28
and $12, $2, $5
or $13, $6, $2
                     Control flow
add $14, $2, $2
lw $4, 50($7)
```









## Pipelining Hazards Summary

Situations that prevent starting the next instruction in the next cycle

## Hazard #1: Structural hazard

Conflict for use of a hardware resource

#### Solution:

- Stall
- Resource duplication

## Hazard #2: Data hazard

An instruction cannot execute because data is not yet available

#### **Solution:**

- Stall
- Forwarding
- Compiler optimization

#### Hazard #3: Control hazard

The next instruction is uncertain due to branching

#### Solution:

- Stall
- Optimized branch processing
- Branch prediction
- Delayed branch

## Solution for Control Hazard: Stall



- Most naïve approach: stall on branch
  - Wait until branch outcome determined before fetching next instruction

| Branch instruction | IF | ID | EX    | MEM   | WB |    |    |     |     |     |
|--------------------|----|----|-------|-------|----|----|----|-----|-----|-----|
| Branch successor   |    | IF | stall | stall | IF | ID | EX | MEM | WB  |     |
| Branch successor+1 |    |    |       |       |    | IF | ID | EX  | MEM | WB  |
| Branch successor+2 |    |    |       |       |    |    | IF | ID  | EX  | MEM |
| Branch successor+3 |    |    |       |       |    |    |    | IF  | ID  | EX  |
| Branch successor+4 |    |    |       |       |    |    |    |     | IF  | ID  |
| Branch successor+5 |    |    |       |       |    |    |    |     |     | IF  |

## Solution for Control Hazard: Stall

- Most naïve approach: stall on branch
  - Wait until branch outcome determined before fetching next instruction



Work correctly, but is slow (3 cycle penalty regardless of whether a branch occurs)

## Solution for Control Hazard: Stall

- For 5-stage pipeline, 3 cycle penalty for stall
- 15% branch frequency

What is the CPI?

$$1 + (0.15 \times 3) = 1.45$$

CPI in the ideal situation of pipelining

3 cycle penalty

### Solution for Control Hazard: Stall

70

- Most naïve approach: stall on branch
  - Wait until branch outcome determined before fetching next instruction



Work correctly, but is slow (3 cycle penalty regardless of whether a branch occurs)

## Solution for Control Hazard: Stall





# How can the penalty be reduced to less than 3 clock cycles?

Optimized branch processing!
(H/W modification to achieve
3 cycle penalty → 1 cycle penalty)

## Optimized Branch Processing: Observation



#### Optimized Branch Processing: Idea

Move hardware to determine branch outcome to the ID stage



# Solution for Control Hazard: Optimized \* Branch Processing

Move hardware to determine branch outcome to the ID stage



#### Solution for Control Hazard: Optimized **Branch Processing**

Move hardware to determine branch outcome to the ID stage



- Bringing in the entire ALU is expensive (non-sense at all)
- Deciding whether to branch can be done with just an equality check

## Solution for Control Hazard: Optimized \* Branch Processing

Move hardware to determine branch outcome to the ID stage

- -Compute branch target address early
- -Find out branch taken or not early
  - Extra lightweight hardware for equality check
  - Equality can be tested by first XORing inputs and then ORing all the results

Reduce branch delay to one clock cycle (Flush one instruction if the branch is taken)

### Optimized Branch Processing: Flushing Logic



#### **Optimized Branch Processing: Example**

```
36: sub $10, $4, $8
40: beq $1, $3, 7
44: and $12, $2, $5
48: or $13, $2, $6
52: add $14, $4, $2
                         Control flow
56: slt $15, $6, $7
        $4, 50($7)
72: lw
```

#### **Optimized Branch Processing: 3rd Cycle**



#### **Optimized Branch Processing: 3rd Cycle**



#### **Optimized Branch Processing: 3rd Cycle**



#### Optimized Branch Processing: 4th Cycle

and inst. was discarded from pipeline (only one cycle penalty)



## Solution for Control Hazard: Optimized \* Branch Processing



If there is a branch instruction, there is still a 1-cycle penalty.

How can performance be further improved?

**Branch prediction!** 

### Solution for Control Hazard: Branch Prediction

Guess one direction then backup if wrong

-Our prediction so far: a branch is not taken



#### Solution for Control Hazard: Branch Prediction

Guess one direction then backup if wrong

-Our prediction so far: a branch is not taken



Prediction No penalty

### Solution for Control Hazard: Branch Prediction

Guess one direction then backup if wrong

incorrect

-Our prediction so far: a branch is not taken

when we are wrong



### Types of Branch Prediction

Guess one direction then backup if wrong

- 1. Static branch prediction
  - A fixed decision based on certain *predefined rules*
  - Rule #1: Predicting Not Taken
  - Rule #2: Predicting *Taken*

Programs with few branches tend to have poor performance

Programs with many branches tend to have poor performance

A simple static prediction scheme will probably waste too much performance

### Types of Branch Prediction

Guess one direction then backup if wrong

#### 1. Static branch prediction

- A fixed decision based on certain <u>predefined rules</u>
- Rule #1: Predicting Not Taken
- Rule #2: Predicting Taken

#### 2. Dynamic branch prediction

- Prediction of branches at runtime using runtime information

#### **Dynamic Branch Prediction: Observation**

Branch history is important: if a branch was taken previously, it is likely that the branch will be taken again the next time

#### Dynamic Branch Prediction: Basic Idea



The memory contains one or more bits indicating whether the branch was recently taken or not

### Dynamic Branch Prediction: 1-bit Prediction



#### **Dynamic Branch Prediction: Initial State**

Current policy: the next instruction in sequence is fetched

0 (Not taken)

Branch prediction buffer (a.k.a., branch history table)



#### Dynamic Branch Prediction: 1st Iteration 98



Mispredict cycle penalty)



### Dynamic Branch Prediction: Update Buffer

Branch outcome: taken



(Taken) Branch prediction buffer (a.k.a., branch history table) Not taken Taken Not taken Predict not taken

### Dynamic Branch Prediction: 2nd Iteration 95





### Dynamic Branch Prediction: Update Buffer

Branch outcome: taken



(Taken) Branch prediction buffer (a.k.a., branch history table) No change Not taken Not taken Predict not taken Taken

#### Dynamic Branch Prediction: 3rd Iteration 97







### Dynamic Branch Prediction

- Prediction of branches <u>at runtime</u> using <u>runtime information</u>
- Use a branch prediction buffer (a.k.a., branch history table)
  - A bit says whether the branch was recently taken or not
  - If an instruction is predicted to be taken, fetching begins from the branch target
- Ideally, each branch instruction would have its own branch prediction buffer
  - However, due to the performance issue, the buffer table size is limited
  - Therefore, multiple branch instructions can share the same buffer
  - Indexed using the lower bits of the branch instruction's address



Initial state



| for | (i  | = 0 | , i | <  | 16 | 90; | ; i- | ++)/ |    |
|-----|-----|-----|-----|----|----|-----|------|------|----|
|     | for | (j  | =   | 0, | j  | <   | 3;   | j++  | -) |
|     |     | 100 | ор  | bo | dy |     |      |      |    |

| Time | <b>Buffer State</b> | Outcome | Result? |
|------|---------------------|---------|---------|
| 1    | Not taken           | Taken   | Wrong   |
| 2    | Taken               | Taken   | Correct |



| Time Buffer State |           | Outcome | Result? |  |
|-------------------|-----------|---------|---------|--|
| 1                 | Not taken | Taken   | Wrong   |  |
| 2                 | Taken     | Taken   | Correct |  |
| 3                 | Taken     | Taken   | Correct |  |



| Time | Buffer State | Outcome   | Result? |
|------|--------------|-----------|---------|
| 1    | Not taken    | Taken     | Wrong   |
| 2    | Taken        | Taken     | Correct |
| 3    | Taken        | Taken     | Correct |
| 4    | Taken        | Not taken | Wrong   |



Mispredict as taken on last iteration of inner loop

|   | Time    | Buffer State | Outcome   | Result? |
|---|---------|--------------|-----------|---------|
|   | 1       | Not taken    | Taken     | Wrong   |
|   | 2       | Taken        | Taken     | Correct |
|   | 3 Taken |              | Taken     | Correct |
| 4 |         | Taken        | Not taken | Wrong   |
|   | 5       | Not taken    | Taken     | Wrong   |
|   |         |              |           | ·       |



Then mispredict as not taken on first iteration of inner loop next time around

**Problem**: Inner loop branches mispredicted twice!



|   | Time | <b>Buffer State</b> | Outcome   | Result? |
|---|------|---------------------|-----------|---------|
|   |      | Not taken           | Taken     | Wrong   |
|   | 2    | Taken               | Taken     | Correct |
|   | 3    | Taken               | Taken     | Correct |
|   | 4    | Taken               | Not taken | Wrong   |
|   | 5    | Not taken           | Taken     | Wrong   |
|   | 6    | Taken               | Taken     | Correct |
| • | 7    | Taken               | Taken     | Correct |
|   | 8    | Taken               | Not taken | Wrong   |
|   | 9    | Not taken           | Taken     | Wrong   |
|   | 10   | Taken               | Taken     | Correct |
|   | 11   | Taken               | Taken     | Correct |
|   | 12   | Taken               | Not taken | Wrong   |
|   |      |                     |           |         |







#### **Dynamic Branch Prediction: 2-bit Prediction**

10

One misprediction each loop execution

Predict not taken

Initial state

| Time | Buffer St   | é   | Outcome | Result? |
|------|-------------|-----|---------|---------|
| 1    | Not taken ( | 10) | Taken   | Wrong   |



Not taken

Taken

Predict not taken

Not taken

#### **Dynamic Branch Prediction: 2-bit Prediction**

One misprediction each loop execution

Predict taken

Predict not taken

Taken

Not taken



Taken

Not taken

Taken

Predict taken

00

Predict not taken

Not taken

| Time | Buffer State   | Outcome | Result? |
|------|----------------|---------|---------|
| 1    | Not taken (10) | Taken   | Wrong   |
| 2    | Not taken (11) | Taken   | Wrong   |

## **Dynamic Branch Prediction: 2-bit Prediction**



| Time | Buffer State   | Outcome | Result? |
|------|----------------|---------|---------|
| 1    | Not taken (10) | Taken   | Wrong   |
| 2    | Not taken (11) | Taken   | Wrong   |
| 3    | Taken (01)     | Taken   | Correct |

## **Dynamic Branch Prediction: 2-bit Prediction**



| <b>*</b> | Time | Buffer State   | Outcome   | Result? |
|----------|------|----------------|-----------|---------|
|          | 1    | Not taken (10) | Taken     | Wrong   |
|          | 2    | Not taken (11) | Taken     | Wrong   |
|          | 3    | Taken (01)     | Taken     | Correct |
|          | 4    | Taken (00)     | Not taken | Wrong   |

| Taken             | •            |                   |
|-------------------|--------------|-------------------|
|                   | Not taken    |                   |
| Predict taken     |              | Predict taken     |
| 00                | Taken        | 01                |
|                   |              | Not taken Taken   |
|                   | Not taken    |                   |
| Predict not taken | , tot tallon | Predict not taken |
| 10                | Taken        | 11                |
| Not taken         |              |                   |

## **Dynamic Branch Prediction: 2-bit Prediction**



| <b>&gt;</b> | Time | Buffer State   | Outcome   | Result? |
|-------------|------|----------------|-----------|---------|
|             | 1    | Not taken (10) | Taken     | Wrong   |
|             | 2    | Not taken (11) | Taken     | Wrong   |
|             | 3    | Taken (01)     | Taken     | Correct |
|             | 4    | Taken (00)     | Not taken | Wrong   |
|             | 5    | Taken (01)     | Taken     | Correct |

| Predict taken                   | Not taken<br>Taken | Predict t    | aken  |
|---------------------------------|--------------------|--------------|-------|
|                                 |                    | Not taken    | Taken |
| Predict not taken 10  Not taken | Not taken Taken    | Predict no 1 |       |

### **Dynamic Branch Prediction: 2-bit Prediction**

One mis

| Taken             |           |                   |
|-------------------|-----------|-------------------|
|                   | Not taken |                   |
| Predict taken     | Taken     | Predict taken 01  |
| 00                | Takon     | 1                 |
|                   |           | Not taken Taken   |
|                   | Not taken |                   |
| Predict not taken |           | Predict not taken |
| 10                | Taken     | 11                |
|                   |           |                   |
| Not taken         |           |                   |

| Time | Buffer State   | Outcome   | Result? |
|------|----------------|-----------|---------|
| 1    | Not taken (10) | Taken     | Wrong   |
| 2    | Not taken (11) | Taken     | Wrong   |
| 3    | Taken (01)     | Taken     | Correct |
| 4    | Taken (00)     | Not taken | Wrong   |
| 5    | Taken (01)     | Taken     | Correct |
| 0    | Taken (00)     | Taken     | Correct |
| 7    | Taken (00)     | Taken     | Correct |
| 8    | Taken (00)     | Not taken | Wrong   |
| 9    | Taken (01)     | Taken     | Correct |
| 10   | Taken (00)     | Taken     | Correct |
| 11   | Taken (00)     | Taken     | Correct |
| 12   | Taken (00)     | Not taken | Wrong   |
|      |                |           |         |

# Solution for Control Hazard: Branch Prediction



Prediction is effective, but we don't have a hardware expert. In this situation, is there any way to resolve the control hazard in software manner?

Delayed Branch! (Compiler-driven optimization)

### Our Assumption on HW: Do not Flush



regardless of whether the branch is taken or not

# Solution for Control Hazard: Delayed Branch



Compiler insert the **branch delay slot** with instruction that is not affected by the branch

# Solution for Control Hazard: Delayed Branch



Compiler insert the **branch delay slot** with instruction that is not affected by the branch

The simplest method is to insert a nop instruction (1 cycle penalty)

# Solution for Control Hazard: Delayed Branck

(with advanced compiler)



Code reordering (move this instruction, which is not affected by the branch)

## Solution for Control Hazard: Delayed Branck

(with advanced compiler)



(no penalty)

if \$s2 = 0 then add \$s1, \$s2, \$s3

# Solution for Control Hazard: Delayed Branck

(with advanced compiler)



Equivalent meaning







Situations that prevent starting the next instruction in the next cycle

# Hazard #1: Structural hazard

Conflict for use of a hardware resource

#### Solution:

- Stall
- Resource duplication

# Hazard #2: Data hazard

An instruction cannot execute because data is not yet available

#### **Solution:**

- Stall
- Forwarding
- Compiler optimization

#### Hazard #3: Control hazard

The next instruction is uncertain due to branching

#### Solution:

- Stall
- Optimized branch processing
- Branch prediction
- Delayed branch

# Question?