## CMP-3004
## Computer Organization

### Spring 2023


## Review 

### Non-pipelined v. pipelined

![](../images/pipe4.png)

- In a pipelined design, the clock cycle must have the worst-case clock cycle of 200 ps

## Pipeline hazards

There are situations in pipelining when the next instruction cannot execute in the
following clock cycle



### Structural hazards

The hardware cannot support the combination of instructions that are requested to be executed in the same clock cycle

- Washer-dryer combination instead of a separate washer and dryer
- MIPS has two memories. Without two memories, the pipeline could have a structural hazard 
    - accessing data v. fetching instruction

### Data hazards

The pipeline must be stalled because one step must wait for another to complete

- Dependence of one instruction on an earlier one that is still in the pipeline

```
add $s0, $t0, $t1
sub $t2, $s0, $t3
```

- The add instruction doesn’t write the result until the fifth stage. Three clock cycles in the pipeline are wasted

    - Solution: waiting for instruction completion before trying to resolve the data hazard is not required
    - As soon as the ALU creates the sum for the add, it can be supplied as an input for the subtract

### Forwarding or Bypassing

- Extra hardware is added to retrieve the missing item early from the internal resources
- Forwarding paths are valid only if the destination stage is later in time than the source stage

![](../images/pipe5.png)

### Stalling

It cannot prevent all pipeline stalls. A stall is needed when an R-format instruction follows a load

```
lw $s0, 20($t1)
sub $t2, $s0, $t3
```

![](../images/pipe6.png)

### Control hazards

A decision is made based on the results of one instruction while others are executing

- The instruction following the branch instruction must be fetched on the next clock cycle. Nevertheless, the pipeline cannot possibly know what is the next instruction

- **Option 1:** after fetching a branch, wait until to determine the instruction address to fetch from (stall: larger slowdown)

- **Option 2:** predicting what branches are taken and what are not. Dynamic hardware predictors provide behavioral-based prediction for each branch and may change predictions over the life of a program

    - Analyze the history of taken or untaken branches to predict the future​

### Control hazards

![](../images/pipe7.png)

###  Delayed decision

- **Option 3:** delayed decision (MIPS). The delayed branch always executes the next sequential instruction, with the branch taking place after that one instruction delay
    - Execute something independent from the branch, so that cycle is not wasted
- MIPS software always schedules a branch-independent instruction after the branch, and a taken branch changes the address of the instruction that follows this safe instruction

## Pipelined datapath and control

We can structure our datapath in a way that we can execute up to five instructions in the same cycle.

We separate the instructions into five pieces:

1. IF: Instruction fetch
2. ID: Instruction decode and register fi le read
3. EX: Execution or address calculation
4. MEM: Data memory access
5. WB: Write back

### Pipelined datapath and control

![](../images/pipe_dp1.png)

### Pipelined execution

Let's pretend each instruction has its own datapath

![](../images/pipe_dp2.png)

- to allow sharing of resources, we need to have registers wherever there are dividing lines
    - for example: IM is used only in one stage, to retain the value (for the other four stages) we should move it to a register
- in the laundry analogy, we need to have basket between each state to move the clothes around

### Pipelined datapath with registers

All instructions advance during each clock cycle from one pipeline register to the next

![](../images/pipe_dp3.png)

### Stage 1

- Example: lw instruction
- Increment address (PC+4) is saved in the IF/ID pipeline register to be used later (e.g. BEQ) 

![](../images/pipe_dp4.png)

### Stage 2

- 16-bit immediate instruction field, two registers numbers and the incremented PC are stored in in the ID/EX pipeline register.

![](../images/pipe_dp5.png)


### Stage 3

- ALU calculates the memory address and saves then in the EX/MEM pipeline register

![](../images/pipe_dp6.png)


### Stage 4

- Data memory is read using the address from EX/MEM and loading the data in MEM/WB pipeline register

![](../images/pipe_dp7.png)

### Stage 5

- Data memory is read from MEM/WB and written into the destination register file 
- The destination register number must be preserved along the stages

![](../images/pipe_dp8.png)

**Problem:** The instruction in the IF/ID pipeline register supplies the write register number. **This value gets overwritten by the next instruction**

### Corrected pipeline

![](../images/pipe_dp9.png)

### Control values

The control values stayed the same but we still need registers to preserve their state

![](../images/pipe_dp10.png)

### Control values pipeline

Control signals are used in the appropriate pipeline stage as the instruction moves down the pipeline​

![](../images/pipe_dp11.png)

### Full pipeline

![](../images/pipe_dp12.png)

## Data hazards

```
sub $2, $1,$3 # Register $2 written by sub
and $12,$2,$5 # 1st operand($2) depends on sub
or $13,$6,$2 # 2nd operand($2) depends on sub
add $14,$2,$2 # 1st($2) & 2nd($2) depend on sub
sw $15,100($2) # Base ($2) depends on sub
```
- The last four instructions are all dependent on the result in register `$2` of the
first instruction

- If register `$2` had the value 10 before the subtract instruction and −20 afterwards, the programmer intends that −20 will be used in the following instructions that refer to register `$2`

### Problems with this example

![](../images/pipe_dp13.png)

### Dependencies

- It is possible to supply the inputs to the ALU needed by the AND instruction and OR instruction by forwarding the results found in the pipeline registers

![](../images/pipe_dp14.png)

### Forwarding unit

We now add multiplexors to add the forwarding paths, and a forwarding unit to select the proper path.

![](../images/pipe_dp15.png)

### Forwarding control signals

![](../images/pipe_dp16.png)

### Datapath with forwarding unit

![](../images/pipe_dp17.png)

### Other hazards

![](../images/pipe_dp18.png)

### Stalling

![](../images/pipe_dp19.png)

### Hazard detection

![](../images/pipe_dp20.png)