# CS/ECE 5381/7381 Computer Architecture Spring 2023

Dr. Manikas

Computer Science

Lecture 10: Feb. 28, 2023

## Project 2

- Due TODAY Tues, Feb. 28 (11:59 pm)
- MARS tool
- Assignment:
  - Translate high-level language code into MIPS code
  - Run code on MARS tool

## Project 3 (7381 only)

- Due Thur., Mar 2 (11:59 pm)
- For CS/ECE 7381 students ONLY
- Additional MIPS programming assignment using MARS tool

## Instruction-Level Parallelism (ILP) and Its Exploitation

(Chapter 3, Hennessy and Patterson)

Note: some course slides adopted from publisher-provided material

#### Outline

- 3.1 ILP Background
- 3.2 Basic Compiler Techniques for ILP
- 3.3 Branch Prediction
- 3.4 Data Hazards and Dynamic Scheduling
- 3.5 Dynamic Scheduling Algorithm
- 3.6 Hardware-Based Speculation

## Scheduling

- Adding stalls is a simple way to address data dependencies and latencies
- However, adding stalls = adding cycles, which affects overall performance
- An alternate approach is to re-arrange the instruction order
  - Also called instruction re-ordering or scheduling

## Instruction Dependencies

| Opcode | Operands   | Latency | Inputs | Outputs  |
|--------|------------|---------|--------|----------|
| L.D    | F0,0(R1)   | 1       | R1     | FO       |
| ADD.D  | F4,F0,F2   | 2       | F0, F2 | F4       |
| S.D    | F4,0(R1)   | 0       | F4, R1 | (memory) |
| DADDUI | R1,R1,#-8  | 1       | R1     | R1       |
| BNE    | R1,R2,LOOP | 0       | R1,R2  | (branch) |

| Register | Used as Input         | Used as Output |
|----------|-----------------------|----------------|
| R1       | L.D, S.D, DADDUI, BNE | DADDUI         |
| R2       | BNE                   |                |
| F0       | ADD.D                 | L.D            |
| F2       | ADD.D                 |                |
| F4       | S.D                   | ADD.D          |

#### Graphical Representation of Code







Instruction latencies in RED



Input-output dependency of register x

#### **Code Modification**



Need to add 2 stalls after ADD.D as before:

| LOOP: | L.D    | F0,0(R1)   |
|-------|--------|------------|
|       | ADD.D  | F4,F0,F2   |
|       | stall  |            |
|       | stall  |            |
|       | S.D    | F4, 0(R1)  |
|       | DADDUI | R1,R1,#-8  |
|       | BNE    | R1,R2,LOOP |

#### Using Graph to Revise Code



Note: ADD.D is independent of DADDUI, so both instructions can execute in parallel – revise code:

```
LOOP: L.D F0,0(R1)

DADDUI R1,R1,#-8

ADD.D F4,F0,F2

stall

stall

S.D F4, 8(R1)

BNE R1,R2,LOOP
```

Instruction latencies in RED

 $\xrightarrow{\mathsf{X}}$ 

Input-output dependency of register x

## **Loop Unrolling**

- How to handle branch scheduling?
  - "unroll" loops
    - Multiple replications of loop body
  - Eliminates branching (and associated hazards)
  - But: increases code size (more instructions)

## Example – Loop Unrolling

#### Original Code

| LOOP: | L.D    | F0,0(R1)   |
|-------|--------|------------|
|       | ADD.D  | F4,F0,F2   |
|       | S.D    | F4,0(R1)   |
|       | DADDUI | R1,R1,#-8  |
|       | BNE    | R1,R2,LOOP |

Now, assume that we know the loop will be executed at least twice – unroll the loop for these two iterations



| LOOP: | L.D<br>ADD.D | F0,0(R1)<br>F4,F0,F2     | ;first memory location      |
|-------|--------------|--------------------------|-----------------------------|
|       | S.D          | F4,0(R1)                 |                             |
|       | L.D          | F1,- <mark>8</mark> (R1) | ;second memory location     |
|       | ADD.D        | F5,F1,F2                 | ;use new registers (F1, F5) |
|       | S.D          | F5,-8(R1)                | to help with scheduling;    |
|       | DADDUI       | R1,R1,#- <mark>16</mark> | ;decrement by 2 addresses   |
|       | BNE          | R1,R2,LOOP               |                             |

## Scheduling the Unrolled Loop





#### Outline

- 3.1 ILP Background
- 3.2 Basic Compiler Techniques for ILP
- 3.3 Branch Prediction
- 3.4 Data Hazards and Dynamic Scheduling
- 3.5 Dynamic Scheduling Algorithm
- 3.6 Hardware-Based Speculation

#### **Branch Prediction**

- Loop unrolling can handle simple branching instances (if we can estimate number of iterations)
  - Difficult for more complex branching (if-then-else, subroutine calls)
- Alternate approach try to predict branch behavior

## Static vs. Dynamic Prediction

- Static done before execution during compiling
  - Requires statistical estimation
  - Often inaccurate
- Dynamic done during execution
  - More commonly used
  - Predictions updated based on program behavior



#### **Correlating Branch Predictors**

- 2-bit predictor: uses recent behavior of single branch
- What if we also examine recent behavior of other branches?

Note that branch B3
depends on outcome of B1,
B2
If **both** B1 and B2 taken,
then B3 <u>not</u> taken

There is a <u>correlation</u> between B3, B2, B1

#### Outline

- 3.1 ILP Background
- 3.2 Basic Compiler Techniques for ILP
- 3.3 Branch Prediction
- 3.4 Data Hazards and Dynamic Scheduling
- 3.5 Dynamic Scheduling Algorithm
- 3.6 Hardware-Based Speculation

#### Data Hazards and Dynamic Scheduling

- *Static scheduling*: compiler schedules instructions
  - Done before execution
- *Dynamic scheduling*: hardware schedules instructions
  - Rearranges instructions during execution
  - "out-of-order" execution
  - Can be used to overcome data hazards

#### Dynamic Scheduling

Rearrange order of instructions to reduce stalls while maintaining data flow

#### Advantages:

- Compiler doesn't need to have knowledge of microarchitecture
- Handles cases where dependencies are unknown at compile time
- Disadvantage:
  - Substantial increase in hardware complexity
  - Complicates exceptions

#### Dynamic Scheduling

- Dynamic scheduling implies:
  - Out-of-order execution
  - Out-of-order completion
- Creates the possibility for WAR and WAW hazards
- Tomasulo's Approach
  - Tracks when operands are available
  - Introduces register renaming in hardware
    - Minimizes WAW and WAR hazards

#### Register Renaming

#### Example:

+ name dependence with F6

#### Register Renaming

• Example:

```
DIV.D F0,F2,F4
ADD.D S,F0,F8
S.D S,O(R1)
SUB.D T,F10,F14
MUL.D F6,F10,T
```

Now only RAW hazards remain, which can be strictly ordered

#### Outline

- 3.1 ILP Background
- 3.2 Basic Compiler Techniques for ILP
- 3.3 Branch Prediction
- 3.4 Data Hazards and Dynamic Scheduling
- 3.5 Dynamic Scheduling Algorithm
- 3.6 Hardware-Based Speculation

#### A Dynamic Algorithm: Tomasulo's

- For IBM 360/91 (before caches!)
  - − ⇒ Long memory latency
- Goal: High Performance without special compilers
- Small number of floating point registers (4 in 360)
   prevented interesting compiler scheduling of operations
  - This led Tomasulo to try to figure out how to get more effective registers — renaming in hardware!
- Why Study 1966 Computer?
- The descendants of this have flourished!
  - Alpha 21264, Pentium 4, AMD Opteron, Power 5, ...

#### Tomasulo Algorithm

- Control & buffers <u>distributed</u> with Function Units (FU)
  - FU buffers called "<u>reservation stations</u>"; have pending operands
- Registers in instructions replaced by values or pointers to reservation stations(RS); called register renaming;
  - Renaming avoids WAR, WAW hazards
  - More reservation stations than registers, so can do optimizations compilers can't

#### Tomasulo Algorithm

- Results to FU from RS, <u>not through registers</u>, over <u>Common Data Bus</u> that broadcasts results to all FUs
  - Avoids RAW hazards by executing an instruction only when its operands are available
- Load and Stores treated as FUs with RSs as well
- Integer instructions can go past branches (predict taken), allowing FP ops beyond basic block in FP queue

## **Tomasulo Organization**



#### Reservation Station Components

Op: Operation to perform in the unit (e.g., + or –)

Vj, Vk: Value of Source operands

Store buffers has V field, result to be stored

Qj, Qk: Reservation stations producing source registers (value to be written)

- Note: Qj,Qk=0 => ready
- Store buffers only have Qi for RS producing result

Busy: Indicates reservation station or FU is busy

Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.

#### Three Stages of Tomasulo Algorithm

#### 1.Issue—get instruction from FP Op Queue

If reservation station free (no structural hazard), control issues instr & sends operands (renames registers).

#### 2. Execute—operate on operands (EX)

When both operands ready then execute; if not ready, watch Common Data Bus for result

#### 3. Write result—finish execution (WB)

Write on Common Data Bus to all awaiting units; mark reservation station available

## Assume we have the following MIPS code – how is this handled using Tomasulo's algorithm?

|       |             | Latency |
|-------|-------------|---------|
| L.D   | F6, 34(R2)  | 2       |
| L.D   | F2, 45(R3)  | 2       |
| MUL.D | F0, F2, F4  | 10      |
| SUB.D | F8, F6, F2  | 2       |
| DIV.D | F10, F0, F6 | 40      |
| ADD.D | F6, F8, F2  | 2       |

#### Instruction status table:

| Instruction |     | j  | k  | Issue | Exec | Write  |
|-------------|-----|----|----|-------|------|--------|
|             |     |    |    |       | comp | result |
| L.D         | F6  | 34 | R2 |       |      |        |
| L.D         | F2  | 45 | R3 |       |      |        |
| MUL.D       | FO  | F2 | F4 |       |      |        |
| SUB.D       | F8  | F6 | F2 |       |      |        |
| DIV.D       | F10 | F0 | F6 |       |      |        |
| ADD.D       | F6  | F8 | F2 |       |      |        |

#### Load buffers:

|       | Busy | Address |
|-------|------|---------|
| Load1 | No   |         |
| Load2 | No   |         |
| Load3 | No   |         |

#### **Reservation Stations:**

| Time | Name  |
|------|-------|
|      | Add1  |
|      | Add2  |
|      | Add3  |
|      | Mult1 |
|      | Mult2 |

|      |    | S1 | S2 | RS | RS |
|------|----|----|----|----|----|
| Busy | Op | Vj | Vk | Qj | Qk |

| Busy | Op | Vj | Vk | Qj | Qk |
|------|----|----|----|----|----|
| No   |    |    |    |    |    |

#### Register result status:

| Clock | _  | F0 | F2 | F4 | F6 | F8 | F10 | F12 | F14 |
|-------|----|----|----|----|----|----|-----|-----|-----|
| 0     | FU |    |    |    |    |    |     |     |     |

#### Instruction status table: Load buffers: Instruction j k Exec Write Address Issue Busy result comp Yes 34+R2 L.D F6 34 R2 1 Load1 Load2 No L.D F2 45 R3 No MUL.D F2 F4 Load3 F0 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 CYCLE 1: issue first instruction (L.D with **Reservation Stations: S1** S2 RS RS latency 2) Time Name Busy ۷j ٧k Qj Qk Op Add1 No Add2 No Add3 No Mult1 No Mult2 Νo Register result status: Clock F2 F0 F4 F6 F8 F10 F12 F14 1 FU Load1

| Instruction sta       | tus table: |      |       |       |       |        | _   | Load buffe | ers:      |         |
|-----------------------|------------|------|-------|-------|-------|--------|-----|------------|-----------|---------|
| Instruction           |            | j    | k     | Issue | Exec  | Write  |     |            | Busy      | Address |
|                       |            |      |       |       | comp  | result |     | _          |           |         |
| L.D                   | F6         | 34   | R2    | 1     |       |        |     | Load1      | Yes       | 34+R2   |
| L.D                   | F2         | 45   | R3    | 2     |       |        |     | Load2      | Yes       | 45+R3   |
| MUL.D                 | F0         | F2   | F4    |       |       |        |     | Load3      | No        |         |
| SUB.D                 | F8         | F6   | F2    |       |       |        |     | _          |           |         |
| DIV.D                 | F10        | F0   | F6    |       |       |        |     |            |           |         |
| ADD.D                 | F6         | F8   | F2    |       |       |        |     | CVCLE      | 2. iccur  | second  |
| Reservation Stations: |            |      |       | S1    | S2    | RS     | RS  | instru     | ction (L. |         |
| Time                  | Name       | Busy | Ор    | Vj    | Vk    | Qj     | Qk  | latenc     | y 2)      |         |
|                       | Add1       | No   |       |       |       |        |     |            |           |         |
|                       | Add2       | No   |       |       |       |        |     |            |           |         |
|                       | Add3       | No   |       |       |       |        |     |            |           |         |
|                       | Mult1      | No   |       |       |       |        |     |            |           |         |
|                       | Mult2      | No   |       |       |       |        |     |            |           |         |
| Register result       | status:    |      |       |       |       |        |     |            |           |         |
| Clock                 |            | F0   | F2    | F4    | F6    | F8     | F10 | F12        | F14       |         |
| 2                     | FU         |      | Load2 |       | Load1 |        |     |            |           |         |

#### Instruction status table:

| Instruction |     | j  | k  | Issue | Exec | Write  |
|-------------|-----|----|----|-------|------|--------|
|             |     |    |    |       | comp | result |
| L.D         | F6  | 34 | R2 | 1     | 3    |        |
| L.D         | F2  | 45 | R3 | 2     |      |        |
| MUL.D       | FO  | F2 | F4 | 3     |      |        |
| SUB.D       | F8  | F6 | F2 |       |      |        |
| DIV.D       | F10 | F0 | F6 |       |      |        |
| ADD.D       | F6  | F8 | F2 |       |      |        |
|             |     |    |    |       |      |        |

#### Load buffers:

|       | ,   |       |
|-------|-----|-------|
| Load1 | Yes | 34+R2 |
| Load2 | Yes | 45+R3 |
| Load3 | No  |       |

Busv

Address

Issue MUL.D with latency 10, can't start yet until we have value for Load2

#### **Reservation Stations:**

| Time | Name  | Busy | Ор    | Vj | Vk |
|------|-------|------|-------|----|----|
|      | Add1  | No   |       |    |    |
|      | Add2  | No   |       |    |    |
|      | Add3  | No   |       |    |    |
|      | Mult1 | Yes  | MUL.D |    | F4 |
|      | Mult2 | No   |       |    |    |

#### Register result status:

| Clock | _  | F0    | F2    | F4 | F6    | F8 | F10 | F12 | F14 |
|-------|----|-------|-------|----|-------|----|-----|-----|-----|
| 3     | FU | Mult1 | Load2 |    | Load1 |    |     |     |     |

S1

S2

RS

Qj

Load2

RS

Qk

| Instruction |     | j  | k  | Issue | Exec | Write  |
|-------------|-----|----|----|-------|------|--------|
|             |     |    |    |       | comp | result |
| L.D         | F6  | 34 | R2 | 1     | 3    | 4      |
| L.D         | F2  | 45 | R3 | 2     | 4    |        |
| MUL.D       | F0  | F2 | F4 | 3     |      |        |
| SUB.D       | F8  | F6 | F2 | 4     |      |        |
| DIV.D       | F10 | F0 | F6 |       |      |        |
| ADD.D       | F6  | F8 | F2 |       |      |        |

#### **Load buffers:**

| _     |     |       |
|-------|-----|-------|
| Load1 | No  |       |
| Load2 | Yes | 45+R3 |
| Load3 | No  |       |

Address

Busy

| ervation S | tations: |      |       | S1 | S2 | RS    | RS    |
|------------|----------|------|-------|----|----|-------|-------|
| Time       | Name     | Busy | Ор    | Vj | Vk | Qj    | Qk    |
|            | Add1     | Yes  | SUB.D | F6 |    |       | Load2 |
|            | Add2     | No   |       |    |    |       |       |
|            | Add3     | No   |       |    |    |       |       |
|            | Mult1    | Yes  | MUL.D |    | F4 | Load2 |       |
|            | Mult2    | No   |       |    |    |       |       |
|            |          |      |       |    |    |       |       |

Issue SUB.D with latency 2, also waiting for Load2

| Clock | _  | F0    | F2    | F4 | F6       | F8   | F10 | F12 | F14 |
|-------|----|-------|-------|----|----------|------|-----|-----|-----|
| 4     | FU | Mult1 | Load2 |    | M[34+R2] | Add1 |     |     |     |

| Instruction sta | <u>atus table:</u> |    |    |       |      |        |
|-----------------|--------------------|----|----|-------|------|--------|
| Instruction     |                    | j  | k  | Issue | Exec | Write  |
|                 |                    |    |    |       | comp | result |
| L.D             | F6                 | 34 | R2 | 1     | 3    | 4      |
| L.D             | F2                 | 45 | R3 | 2     | 4    | 5      |
| MUL.D           | F0                 | F2 | F4 | 3     |      |        |
| SUB.D           | F8                 | F6 | F2 | 4     |      |        |
| DIV.D           | F10                | F0 | F6 | 5     |      |        |
| ADD.D           | F6                 | F8 | F2 |       |      |        |
|                 |                    |    | ·  |       |      |        |
| Reservation S   | tations:           |    |    | S1    | S2   | RS     |

#### Load buffers:

|       | ,  |  |
|-------|----|--|
| Load1 | No |  |
| Load2 | No |  |
| Load3 | No |  |

Busv

Address

Issue DIV.D with latency 40, waiting for Mult1

Timer starts down for Add1 (latency 2), Mult1 (latency 10)

#### Reservation Stations:

| Time | Name  | Busy | Ор    | Vj | Vk | Qj    | Q |
|------|-------|------|-------|----|----|-------|---|
| 2    | Add1  | Yes  | SUB.D | F6 | F2 |       |   |
|      | Add2  | No   |       |    |    |       |   |
|      | Add3  | No   |       |    |    |       |   |
| 10   | Mult1 | Yes  | MUL.D | F2 | F4 |       |   |
|      | Mult2 | Yes  | DIV.D |    | F6 | Mult1 |   |

#### Register result status:

| Clock | _  | F0    | F2       | F4 | F6       | F8   | F10   | F12 | F14 |
|-------|----|-------|----------|----|----------|------|-------|-----|-----|
| 5     | FU | Mult1 | M[45+R3] |    | M[34+R2] | Add1 | Mult2 |     |     |

RS

| Instruction |     | j  | k  | Issue | Exec | Write  |
|-------------|-----|----|----|-------|------|--------|
|             |     |    |    |       | comp | result |
| L.D         | F6  | 34 | R2 | 1     | 3    | 4      |
| L.D         | F2  | 45 | R3 | 2     | 4    | 5      |
| MUL.D       | FO  | F2 | F4 | 3     |      |        |
| SUB.D       | F8  | F6 | F2 | 4     |      |        |
| DIV.D       | F10 | F0 | F6 | 5     |      |        |
| ADD.D       | F6  | F8 | F2 | 6     |      |        |

#### Load buffers:

|       | •  |  |
|-------|----|--|
| Load1 | No |  |
| Load2 | No |  |
| Load3 | No |  |

Address

Busy

#### **Reservation Stations:**

| Time | Name  |
|------|-------|
| 1    | Add1  |
|      | Add2  |
|      | Add3  |
| 9    | Mult1 |
|      | Mult2 |
|      |       |

|      |                | 51                               | 52                                                | RS                                                                                                                                                                                          | RS                                                                                                                                                                                                                  |
|------|----------------|----------------------------------|---------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Busy | Op             | Vj                               | Vk                                                | Qj                                                                                                                                                                                          | Qk                                                                                                                                                                                                                  |
| Yes  | SUB.D          | F6                               | F2                                                |                                                                                                                                                                                             |                                                                                                                                                                                                                     |
| Yes  | ADD.D          |                                  | F2                                                | Add1                                                                                                                                                                                        |                                                                                                                                                                                                                     |
| No   |                |                                  |                                                   |                                                                                                                                                                                             |                                                                                                                                                                                                                     |
| Yes  | MUL.D          | F2                               | F4                                                |                                                                                                                                                                                             |                                                                                                                                                                                                                     |
| Yes  | DIV.D          |                                  | F6                                                | Mult1                                                                                                                                                                                       |                                                                                                                                                                                                                     |
|      | Yes Yes No Yes | Yes SUB.D Yes ADD.D No Yes MUL.D | Busy Op Vj Yes SUB.D F6 Yes ADD.D No Yes MUL.D F2 | Busy         Op         Vj         Vk           Yes         SUB.D         F6         F2           Yes         ADD.D         F2           No         Yes         MUL.D         F2         F4 | Busy         Op         Vj         Vk         Qj           Yes         SUB.D         F6         F2           Yes         ADD.D         F2         Add1           No         Yes         MUL.D         F2         F4 |

Issue ADD.D with latency 2, waiting for Add1

| Clock |    | F0    | F2       | F4 | F6   | F8   | F10   | F12 | F14 |
|-------|----|-------|----------|----|------|------|-------|-----|-----|
| 6     | FU | Mult1 | M[45+R3] |    | Add2 | Add1 | Mult2 |     |     |

| Instruction |     | j  | k  | Issue | Exec | Write  |
|-------------|-----|----|----|-------|------|--------|
|             |     |    |    |       | comp | result |
| L.D         | F6  | 34 | R2 | 1     | 3    | 4      |
| L.D         | F2  | 45 | R3 | 2     | 4    | 5      |
| MUL.D       | F0  | F2 | F4 | 3     |      |        |
| SUB.D       | F8  | F6 | F2 | 4     | 7    |        |
| DIV.D       | F10 | F0 | F6 | 5     |      |        |
| ADD.D       | F6  | F8 | F2 | 6     |      |        |

#### Load buffers:

|       | busy | Address |
|-------|------|---------|
| Load1 | No   |         |
| Load2 | No   |         |
| Load3 | No   |         |

Addrace

#### **Reservation Stations:**

| Time | Name  |
|------|-------|
| 0    | Add1  |
|      | Add2  |
|      | Add3  |
| 8    | Mult1 |
|      | Mult2 |
|      |       |

|      |       | <b>S1</b> | S2 | RS    | RS |
|------|-------|-----------|----|-------|----|
| Busy | Op    | Vj        | Vk | Qj    | Qk |
| Yes  | SUB.D | F6        | F2 |       |    |
| Yes  | ADD.D |           | F2 | Add1  |    |
| No   |       |           |    |       |    |
| Yes  | MUL.D | F2        | F4 |       |    |
| Yes  | DIV.D |           | F6 | Mult1 |    |

### Add1 (SUB.D) completing

| Clock |  |
|-------|--|
| 7     |  |

|    | FU    | F2       |
|----|-------|----------|
| FU | Mult1 | M[45+R3] |
|    |       |          |

| F4 | F6  |
|----|-----|
|    | Ado |

| F8   |   |
|------|---|
| Add1 | Γ |

| F12 |   | F1 |
|-----|---|----|
|     | _ |    |
|     |   |    |

| Instruction |     | j  | k  | Issue | Exec | Write  |
|-------------|-----|----|----|-------|------|--------|
|             |     |    |    |       | comp | result |
| L.D         | F6  | 34 | R2 | 1     | 3    | 4      |
| L.D         | F2  | 45 | R3 | 2     | 4    | 5      |
| MUL.D       | F0  | F2 | F4 | 3     |      |        |
| SUB.D       | F8  | F6 | F2 | 4     | 7    | 8      |
| DIV.D       | F10 | F0 | F6 | 5     |      |        |
| ADD.D       | F6  | F8 | F2 | 6     |      |        |

#### Load buffers:

|       | Busy | Address |
|-------|------|---------|
| Load1 | No   |         |
| Load2 | No   |         |
| Load3 | No   |         |

#### **Reservation Stations:**

| Гime | Name  |
|------|-------|
|      | Add1  |
| 2    | Add2  |
|      | Add3  |
| 7    | Mult1 |
|      | Mult2 |

|      |       | <b>S1</b> | S2 | RS    | RS |
|------|-------|-----------|----|-------|----|
| Busy | Op    | Vj        | Vk | Qj    | Qk |
| No   |       |           |    |       |    |
| Yes  | ADD.D | F6-F2     | F2 |       |    |
| No   |       |           |    |       |    |
| Yes  | MUL.D | F2        | F4 |       |    |
| Yes  | DIV.D |           | F6 | Mult1 |    |

| Clock | _  | F0    | F2       | F4 | F6   | F8    | F10   | F12 | F14 |
|-------|----|-------|----------|----|------|-------|-------|-----|-----|
| 8     | FU | Mult1 | M[45+R3] |    | Add2 | F6-F2 | Mult2 |     |     |

| Instruction |     | j  | k  | Issue | Exec | Write  |
|-------------|-----|----|----|-------|------|--------|
|             |     |    |    |       | comp | result |
| L.D         | F6  | 34 | R2 | 1     | 3    | 4      |
| L.D         | F2  | 45 | R3 | 2     | 4    | 5      |
| MUL.D       | F0  | F2 | F4 | 3     |      |        |
| SUB.D       | F8  | F6 | F2 | 4     | 7    | 8      |
| DIV.D       | F10 | F0 | F6 | 5     |      |        |
| ADD.D       | F6  | F8 | F2 | 6     |      |        |

#### Load buffers:

|       | Busy | Address |
|-------|------|---------|
| Load1 | No   |         |
| Load2 | No   |         |
| Load3 | No   |         |

#### **Reservation Stations:**

| Time | Name  |
|------|-------|
|      | Add1  |
| 1    | Add2  |
|      | Add3  |
| 6    | Mult1 |
|      | Mult2 |

| Busy | Ор    | S1<br>Vj | S2<br>Vk | RS<br>Qj | RS<br>Qk |
|------|-------|----------|----------|----------|----------|
| No   |       |          |          |          |          |
| Yes  | ADD.D | F6-F2    | F2       |          |          |
| No   |       |          |          |          |          |
| Yes  | MUL.D | F2       | F4       |          |          |
| Yes  | DIV.D |          | F6       | Mult1    |          |

| Clock | _  | F0    | F2       | F4 | F6   | F8    | F10   | F12 | F14 |
|-------|----|-------|----------|----|------|-------|-------|-----|-----|
| 9     | FU | Mult1 | M[45+R3] |    | Add2 | F6-F2 | Mult2 |     |     |

| Instruction sta | atus table: |      |       |       |      |        | _  | Load buffe | rs:     |         |
|-----------------|-------------|------|-------|-------|------|--------|----|------------|---------|---------|
| Instruction     |             | j    | k     | Issue | Exec | Write  |    |            | Busy    | Address |
|                 |             |      |       |       | comp | result |    | _          |         |         |
| L.D             | F6          | 34   | R2    | 1     | 3    | 4      |    | Load1      | No      |         |
| L.D             | F2          | 45   | R3    | 2     | 4    | 5      |    | Load2      | No      |         |
| MUL.D           | F0          | F2   | F4    | 3     |      |        |    | Load3      | No      |         |
| SUB.D           | F8          | F6   | F2    | 4     | 7    | 8      |    | _          |         |         |
| DIV.D           | F10         | F0   | F6    | 5     |      |        |    |            |         |         |
| ADD.D           | F6          | F8   | F2    | 6     | 10   |        |    |            |         |         |
|                 |             |      |       |       |      |        | •  | Add        | 2 (ADD  | .D)     |
| Reservation S   | tations:    |      |       | S1    | S2   | RS     | RS | com        | pleting |         |
| Time            | Name        | Busy | Ор    | Vj    | Vk   | Qj     | Qk |            |         |         |
|                 | Add1        | No   |       |       |      |        |    |            |         |         |
| 0               | Add2        | Yes  | ADD.D | F6-F2 | F2   |        |    |            |         |         |
|                 | Add3        | No   |       |       |      |        |    |            |         |         |
| 5               | Mult1       | Yes  | MUL.D | F2    | F4   |        |    |            |         |         |
|                 |             |      |       |       |      |        |    |            |         |         |

Mult2

Yes

| Clock | _  | F0    | F2       | F4 | F6   | F8    | F10   | F12 | F14 |
|-------|----|-------|----------|----|------|-------|-------|-----|-----|
| 10    | FU | Mult1 | M[45+R3] |    | Add2 | F6-F2 | Mult2 |     |     |

DIV.D

F6

Mult1

| Instruction |     | j  | k  | Issue | Exec | Write  |
|-------------|-----|----|----|-------|------|--------|
|             |     |    |    |       | comp | result |
| L.D         | F6  | 34 | R2 | 1     | 3    | 4      |
| L.D         | F2  | 45 | R3 | 2     | 4    | 5      |
| MUL.D       | F0  | F2 | F4 | 3     |      |        |
| SUB.D       | F8  | F6 | F2 | 4     | 7    | 8      |
| DIV.D       | F10 | F0 | F6 | 5     |      |        |
| ADD.D       | F6  | F8 | F2 | 6     | 10   | 11     |

#### Load buffers:

|       | Busy | Address |
|-------|------|---------|
| Load1 | No   |         |
| Load2 | No   |         |
| Load3 | No   |         |

#### Reservation Stations:

| Time | Name  |
|------|-------|
|      | Add1  |
|      | Add2  |
|      | Add3  |
| 4    | Mult1 |
|      | Mult2 |

|      |       | S1 | S2 | RS    | RS |
|------|-------|----|----|-------|----|
| Busy | Op    | Vj | Vk | Qj    | Qk |
| No   |       |    |    |       |    |
| No   |       |    |    |       |    |
| No   |       |    |    |       |    |
| Yes  | MUL.D | F2 | F4 |       |    |
| Yes  | DIV.D |    | F6 | Mult1 |    |

| Clock | _  | F0    | F2       | F4 | F6    | F8    | F10   | F12 | F14 |
|-------|----|-------|----------|----|-------|-------|-------|-----|-----|
| 11    | FU | Mult1 | M[45+R3] |    | F8+F2 | F6-F2 | Mult2 |     |     |

| Instruction |     | j  | k  | Issue | Exec | Write  |
|-------------|-----|----|----|-------|------|--------|
|             |     |    |    |       | comp | result |
| L.D         | F6  | 34 | R2 | 1     | 3    | 4      |
| L.D         | F2  | 45 | R3 | 2     | 4    | 5      |
| MUL.D       | F0  | F2 | F4 | 3     |      |        |
| SUB.D       | F8  | F6 | F2 | 4     | 7    | 8      |
| DIV.D       | F10 | F0 | F6 | 5     |      |        |
| ADD.D       | F6  | F8 | F2 | 6     | 10   | 11     |

#### Load buffers:

|       | Busy | Address |
|-------|------|---------|
| Load1 | No   |         |
| Load2 | No   |         |
| Load3 | No   |         |

#### **Reservation Stations:**

| Time | Name  |
|------|-------|
|      | Add1  |
|      | Add2  |
|      | Add3  |
| 3    | Mult1 |
|      | Mult2 |

|   |      |       | S1 | S2 | RS    | RS |
|---|------|-------|----|----|-------|----|
|   | Busy | Op    | Vj | Vk | Qj    | Qk |
|   | No   |       |    |    |       |    |
|   | No   |       |    |    |       |    |
|   | No   |       |    |    |       |    |
|   | Yes  | MUL.D | F2 | F4 |       |    |
| ſ | Yes  | DIV.D |    | F6 | Mult1 |    |

| Clock | _  | F0    | F2       | F4 | F6    | F8    | F10   | F12 | F14 |
|-------|----|-------|----------|----|-------|-------|-------|-----|-----|
| 12    | FU | Mult1 | M[45+R3] |    | F8+F2 | F6-F2 | Mult2 |     |     |

| Instruction |     | j  | k  | Issue | Exec | Write  |
|-------------|-----|----|----|-------|------|--------|
|             |     |    |    |       | comp | result |
| L.D         | F6  | 34 | R2 | 1     | 3    | 4      |
| L.D         | F2  | 45 | R3 | 2     | 4    | 5      |
| MUL.D       | FO  | F2 | F4 | 3     |      |        |
| SUB.D       | F8  | F6 | F2 | 4     | 7    | 8      |
| DIV.D       | F10 | F0 | F6 | 5     |      |        |
| ADD.D       | F6  | F8 | F2 | 6     | 10   | 11     |

#### Load buffers:

|       | Busy | Address |
|-------|------|---------|
| Load1 | No   |         |
| Load2 | No   |         |
| Load3 | No   |         |

#### **Reservation Stations:**

| Time | Name  |
|------|-------|
|      | Add1  |
|      | Add2  |
|      | Add3  |
| 2    | Mult1 |
|      | Mult2 |

|      |    | S1 | S2 | RS | RS |
|------|----|----|----|----|----|
| Busy | Op | Vj | Vk | Qj | Qk |

| _ | Busy | Op    | Vj | Vk | Qj    | Qk |
|---|------|-------|----|----|-------|----|
|   | No   |       |    |    |       |    |
|   | No   |       |    |    |       |    |
|   | No   |       |    |    |       |    |
|   | Yes  | MUL.D | F2 | F4 |       |    |
|   | Yes  | DIV.D |    | F6 | Mult1 |    |

| Clock | _  | F0    | F2       | F4 | F6    | F8    | F10   | F12 | F14 |
|-------|----|-------|----------|----|-------|-------|-------|-----|-----|
| 13    | FU | Mult1 | M[45+R3] |    | F8+F2 | F6-F2 | Mult2 |     |     |

| Instruction |     | j  | k  | Issue | Exec | Write  |
|-------------|-----|----|----|-------|------|--------|
|             |     |    |    |       | comp | result |
| L.D         | F6  | 34 | R2 | 1     | 3    | 4      |
| L.D         | F2  | 45 | R3 | 2     | 4    | 5      |
| MUL.D       | FO  | F2 | F4 | 3     |      |        |
| SUB.D       | F8  | F6 | F2 | 4     | 7    | 8      |
| DIV.D       | F10 | F0 | F6 | 5     |      |        |
| ADD.D       | F6  | F8 | F2 | 6     | 10   | 11     |

#### **Load buffers:**

|       | Busy | Address |
|-------|------|---------|
| Load1 | No   |         |
| Load2 | No   |         |
| Load3 | No   |         |

#### **Reservation Stations:**

| Time | Name  |
|------|-------|
|      | Add1  |
|      | Add2  |
|      | Add3  |
| 1    | Mult1 |
|      | Mult2 |

|   |      |       | S1 | S2 | RS    | RS |
|---|------|-------|----|----|-------|----|
| _ | Busy | Op    | Vj | Vk | Qj    | Qk |
|   | No   |       |    |    |       |    |
| [ | No   |       |    |    |       |    |
|   | No   |       |    |    |       |    |
| [ | Yes  | MUL.D | F2 | F4 |       |    |
| ſ | Yes  | DIV.D |    | F6 | Mult1 |    |

| Clock | _  | F0    | F2       | F4 | F6    | F8    | F10   | F12 | F14 |
|-------|----|-------|----------|----|-------|-------|-------|-----|-----|
| 14    | FU | Mult1 | M[45+R3] |    | F8+F2 | F6-F2 | Mult2 |     |     |

| Instruction sta | tus table: |      | _     |           |      |        | _  | Load buffe | rs:     |         |
|-----------------|------------|------|-------|-----------|------|--------|----|------------|---------|---------|
| Instruction     |            | j    | k     | Issue     | Exec | Write  |    |            | Busy    | Address |
|                 |            |      |       |           | comp | result |    | _          |         |         |
| L.D             | F6         | 34   | R2    | 1         | 3    | 4      |    | Load1      | No      |         |
| L.D             | F2         | 45   | R3    | 2         | 4    | 5      |    | Load2      | No      |         |
| MUL.D           | F0         | F2   | F4    | 3         | 15   |        |    | Load3      | No      |         |
| SUB.D           | F8         | F6   | F2    | 4         | 7    | 8      |    | _          |         |         |
| DIV.D           | F10        | FO   | F6    | 5         |      |        |    |            |         |         |
| ADD.D           | F6         | F8   | F2    | 6         | 10   | 11     |    | Mul        | t1 (MU  | IL.D)   |
|                 |            |      | _     |           |      |        | -  |            | •       | •       |
| Reservation St  | ations:    |      |       | <b>S1</b> | S2   | RS     | RS | COIII      | pleting | 3       |
| Time            | Name       | Busy | Op    | Vj        | Vk   | Qj     | Qk | _          |         |         |
|                 | Add1       | No   |       |           |      |        |    |            |         |         |
|                 | Add2       | No   |       |           |      |        |    |            |         |         |
|                 | Add3       | No   |       |           |      |        |    |            |         |         |
| 0               | Mult1      | Yes  | MUL.D | F2        | F4   |        |    |            |         |         |
|                 | Mult2      | Yes  | DIV.D |           | F6   | Mult1  |    |            |         |         |

| Clock | _  | F0    | F2       | F4 | F6    | F8    | F10   | F12 | F14 |
|-------|----|-------|----------|----|-------|-------|-------|-----|-----|
| 15    | FU | Mult1 | M[45+R3] |    | F8+F2 | F6-F2 | Mult2 |     |     |

| Instruction sta | atus table: |      |       |       |      |        | _  | Load buffe | rs:      |            |
|-----------------|-------------|------|-------|-------|------|--------|----|------------|----------|------------|
| Instruction     |             | j    | k     | Issue | Exec | Write  | 1  |            | Busy     | Address    |
|                 |             |      |       |       | comp | result |    | _          |          |            |
| L.D             | F6          | 34   | R2    | 1     | 3    | 4      | 1  | Load1      | No       |            |
| L.D             | F2          | 45   | R3    | 2     | 4    | 5      | ]  | Load2      | No       |            |
| MUL.D           | FO          | F2   | F4    | 3     | 15   | 16     | ]  | Load3      | No       |            |
| SUB.D           | F8          | F6   | F2    | 4     | 7    | 8      | 1  | _          |          |            |
| DIV.D           | F10         | F0   | F6    | 5     |      |        | ]  |            |          |            |
| ADD.D           | F6          | F8   | F2    | 6     | 10   | 11     | 1  | Now        | / Mult2  | 2 (DIV.D)  |
|                 |             |      | ·     |       |      |        |    |            |          |            |
| Reservation S   | tations:    |      |       | S1    | S2   | RS     | RS | can        | start (i | atency 40) |
| Time            | Name        | Busy | Op    | Vj    | Vk   | Qj     | Qk | _          |          |            |
|                 | Add1        | No   |       |       |      |        |    |            |          |            |
|                 | Add2        | No   |       |       |      |        |    |            |          |            |
|                 | Add3        | No   |       |       |      |        |    |            |          |            |
|                 | Mult1       | No   |       |       |      |        |    |            |          |            |
| 40              | Mult2       | Yes  | DIV.D | F0    | F6   |        |    |            |          |            |
|                 |             |      |       |       |      |        |    | _          |          |            |

| Clock | _  | F0    | F2       | F4 | F6    | F8    | F10   | F12 | F14 |
|-------|----|-------|----------|----|-------|-------|-------|-----|-----|
| 16    | FU | F2*F4 | M[45+R3] |    | F8+F2 | F6-F2 | Mult2 |     |     |

# Faster than light computation (skip a couple of cycles)

| Instruction sta        | tus table: |      |       |       |           |        | _  | Load buffe | rs:      |         |
|------------------------|------------|------|-------|-------|-----------|--------|----|------------|----------|---------|
| Instruction            |            | j    | k     | Issue | Exec      | Write  |    |            | Busy     | Address |
|                        |            |      |       |       | comp      | result |    | _          |          |         |
| L.D                    | F6         | 34   | R2    | 1     | 3         | 4      |    | Load1      | No       |         |
| L.D                    | F2         | 45   | R3    | 2     | 4         | 5      |    | Load2      | No       |         |
| MUL.D                  | F0         | F2   | F4    | 3     | 15        | 16     |    | Load3      | No       |         |
| SUB.D                  | F8         | F6   | F2    | 4     | 7         | 8      |    |            |          |         |
| DIV.D                  | F10        | F0   | F6    | 5     | 56        |        |    |            |          |         |
| ADD.D                  | F6         | F8   | F2    | 6     | 10        | 11     |    | Mu         | lt2 (DI\ | /.D)    |
| Decemention St         | -tions.    |      |       | S1    | <b>S2</b> | RS     | RS |            | pleting  |         |
| Reservation St<br>Time | Name       | Busy | Op    | Vj    | Vk        | Qj     | Qk |            |          |         |
| Time                   | Add1       | No   | Ор    | Vj    | VK        | ٧      | Qκ | 1          |          |         |
|                        | Add1       | No   |       |       |           |        |    |            |          |         |
|                        | Add3       | No   |       |       | -         |        |    |            |          |         |
|                        | Mult1      | No   |       |       |           |        |    |            |          |         |
| 0                      | Mult2      | Yes  | DIV.D | FO    | F6        |        |    |            |          |         |
| U                      | wittz      | 163  | טוע.ט | FU    | FU        |        |    | I          |          |         |
| Register result        | status:    |      |       |       |           |        |    |            |          |         |

Clock

56

F6

F8+F2

F4

F8

F6-F2

F10

Mult2

F12

F14

F2

M[45+R3]

F0

| Instruction sta       | atus table: |      |    |       |      |        | <u>I</u>   | oad buffe        | rs:          |           |  |
|-----------------------|-------------|------|----|-------|------|--------|------------|------------------|--------------|-----------|--|
| Instruction           |             | j    | k  | Issue | Exec | Write  |            |                  | Busy         | Address   |  |
|                       |             |      |    |       | comp | result |            | _                |              |           |  |
| L.D                   | F6          | 34   | R2 | 1     | 3    | 4      |            | Load1            | No           |           |  |
| L.D                   | F2          | 45   | R3 | 2     | 4    | 5      |            | Load2            | No           |           |  |
| MUL.D                 | F0          | F2   | F4 | 3     | 15   | 16     |            | Load3            | No           |           |  |
| SUB.D                 | F8          | F6   | F2 | 4     | 7    | 8      |            | _                |              |           |  |
| DIV.D                 | F10         | F0   | F6 | 5     | 56   | 57     |            |                  |              |           |  |
| ADD.D                 | F6          | F8   | F2 | 6     | 10   | 11     |            | All instructions |              |           |  |
|                       |             |      |    |       |      |        |            |                  |              |           |  |
| Reservation Stations: |             |      | S1 | S2    | RS   | RS     | completed: |                  |              |           |  |
| Time                  | Name        | Busy | Op | Vj    | Vk   | Qj     | Qk         | In-o             | sue, out-of- |           |  |
|                       | Add1        | No   |    |       |      |        |            | orde             | er exec      | ution and |  |
|                       | Add2        | No   |    |       |      |        |            | out-             | of-ord       | er        |  |
|                       | Add3        | No   |    |       |      |        |            |                  |              |           |  |
|                       | Mult1       | No   |    |       |      |        |            | COM              | pletion      | 1         |  |
|                       | Mult2       | No   |    |       |      |        |            |                  |              |           |  |

| Clock | _  | F0    | F2       | F4 | F6    | F8    | F10   | F12 | F14 |
|-------|----|-------|----------|----|-------|-------|-------|-----|-----|
| 57    | FU | F2*F4 | M[45+R3] |    | F8+F2 | F6-F2 | FO/F6 |     |     |

## Outline

- 3.1 ILP Background
- 3.2 Basic Compiler Techniques for ILP
- 3.3 Branch Prediction
- 3.4 Data Hazards and Dynamic Scheduling
- 3.5 Dynamic Scheduling Algorithm
- 3.6 Hardware-Based Speculation