## CS252 Graduate Computer Architecture

### Lecture 16: Instruction Level Parallelism and Dynamic Execution #1:

March 16, 2001
Prof. David A. Patterson
Computer Science 252
Spring 2001

### Recall from Pipelining Review

- Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls
  - <u>Ideal pipeline CPI</u>: measure of the maximum performance attainable by the implementation
  - <u>Structural hazards</u>: HW cannot support this combination of instructions
  - <u>Data hazards</u>: Instruction depends on result of prior instruction still in the pipeline
  - <u>Control hazards</u>: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps)

### Ideas to Reduce Stalls

Chapter 4

| Technique                |                          | Reduces Data hazard stalls          |  |  |  |  |  |
|--------------------------|--------------------------|-------------------------------------|--|--|--|--|--|
| Dynamic so               | cheduling                |                                     |  |  |  |  |  |
| Dynamic by prediction    | ranch                    | Control stalls                      |  |  |  |  |  |
| Issuing mu instruction   | lltiple<br>ns per cycle  | Ideal CPI                           |  |  |  |  |  |
| Speculatio               | n                        | Data and control stalls             |  |  |  |  |  |
| Dynamic m<br>disambigue  | •                        | Data hazard stalls involving memory |  |  |  |  |  |
| Loop unrol               | ling                     | Control hazard stalls               |  |  |  |  |  |
| Basic comp               | oiler pipeline           | Data hazard stalls                  |  |  |  |  |  |
| Compiler d               | ependence                | Ideal CPI and data hazard stalls    |  |  |  |  |  |
| Software  <br>trace sche | pipelining and<br>duling | Ideal CPI and data hazard stalls    |  |  |  |  |  |
| Compiler s               | peculation               | Ideal CPI, data and control stalls  |  |  |  |  |  |

### Instruction-Level Parallelism (ILP)

- Basic Block (BB) ILP is quite small
  - BB: a straight-line code sequence with no branches in except to the entry and no branches out except at the exit
  - average dynamic branch frequency 15% to 25%
     + 4 to 7 instructions execute between a pair of branches
  - Plus instructions in BB likely to depend on each other
- To obtain substantial performance enhancements, we must exploit ILP across multiple basic blocks
- Simplest: <u>loop-level parallelism</u> to exploit parallelism among iterations of a loop
  - Vector is one way
  - If not vector, then either dynamic via branch prediction or static via loop unrolling by compiler

### Data Dependence and Hazards

• Instr $_J$  is data dependent on Instr $_I$  Instr $_J$  tries to read operand before Instr $_I$  writes it

```
I: add r1,r2,r3
J: sub r4,r1,r3
```

- or  $Instr_J$  is data dependent on  $Instr_K$  which is dependent on  $Instr_I$
- Caused by a "True Dependence" (compiler term)
- If true dependence caused a hazard in the pipeline, called a Read After Write (RAW) hazard

### Data Dependence and Hazards

- Dependences are a property of programs
- Presence of dependence indicates potential for a hazard, but actual hazard and length of any stall is a property of the pipeline
- Importance of the data dependencies
- 1) indicates the possibility of a hazard
- 2) determines order in which results must be calculated
- 3) sets an upper bound on how much parallelism can possibly be exploited
- Today looking at HW schemes to avoid hazard

### Name Dependence #1: Anti-dependence

- Name dependence: when 2 instructions use same register or memory location, called a name, but no flow of data between the instructions associated with that name; 2 versions of name dependence
- Instr<sub>J</sub> writes operand <u>before</u> Instr<sub>I</sub> reads it

```
I: sub r4,r1,r3
J: add r1,r2,r3
K: mul r6,r1,r7
```

Called an "anti-dependence" by compiler writers. This results from reuse of the name "r1"

 If anti-dependence caused a hazard in the pipeline, called a Write After Read (WAR) hazard

### Name Dependence #2: Output dependence

• Instr<sub>J</sub> writes operand <u>before</u> Instr<sub>I</sub> writes it.

```
I: sub r1,r4,r3
J: add r1,r2,r3
K: mul r6,r1,r7
```

- Called an "output dependence" by compiler writers This also results from the reuse of name "r1"
- If anti-dependence caused a hazard in the pipeline, called a Write After Write (WAW) hazard

#### ILP and Data Hazards

- HW/SW must preserve program order: order instructions would execute in if executed sequentially 1 at a time as determined by original source program
- HW/SW goal: exploit parallelism by preserving program order only where it affects the outcome of the program
- Instructions involved in a name dependence can execute simultaneously if name used in instructions is changed so instructions do not conflict
  - Register renaming resolves name dependence for regs
  - Either by compiler or by HW

### Control Dependencies

 Every instruction is control dependent on some set of branches, and, in general, these control dependencies must be preserved to preserve program order

```
if p1 {
    s1;
};
if p2 {
    s2;
}
```

• S1 is control dependent on p1, and S2 is control dependent on p2 but not on p1.

### Control Dependence Ignored

- Control dependence need not be preserved
  - willing to execute instructions that should not have been executed, thereby violating the control dependences, if can do so without affecting correctness of the program
- Instead, 2 properties critical to program correctness are exception behavior and data flow

### **Exception Behavior**

 Preserving exception behavior => any changes in instruction execution order must not change how exceptions are raised in program (=> no new exceptions)

Example:

```
DADDU R2,R3,R4
BEQZ R2,L1
LW R1,0(R2)
```

L1:

Problem with moving LW before BEQZ?

#### Data Flow

- Data flow: actual flow of data values among instructions that produce results and those that consume them
  - branches make flow dynamic, determine which instruction is supplier of data
- Example:

```
DADDU R1,R2,R3
BEQZ R4,L
DSUBU R1,R5,R6
L: ...
OR R7,R1,R8
```

• OR depends on DADDU or DSUBU?

Must preserve data flow on execution

### CS 252 Administrivia

- Project Group Meetings Next Wed March 21
  - No lecture next Wednesday
- Email Project Survey #2 by Monday evening
- Fill out signup sheet for Wednesday discussion

# Advantages of Dynamic Scheduling

- Handles cases when dependences unknown at compile time
  - (e.g., because they may involve a memory reference)
- It simplifies the compiler
- Allows code that compiled for one pipeline to run efficiently on a different pipeline
- Hardware speculation, a technique with significant performance advantages, that builds on dynamic scheduling

### HW Schemes: Instruction Parallelism

Key idea: Allow instructions behind stall to proceed

```
DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F12,F8,F14
```

- Enables out-of-order execution and allows out-of-order completion
- Will distinguish when an instruction begins execution and when it completes execution; between 2 times, the instruction is in execution
- In a dynamically scheduled pipeline, all instructions pass through issue stage in order (in-order issue)

### Dynamic Scheduling Step 1

- Simple pipeline had 1 stage to check both structural and data hazards: Instruction Decode (ID), also called Instruction Issue
- Split the ID pipe stage of simple 5-stage pipeline into 2 stages:
- Issue Decode instructions, check for structural hazards
- Read operands—Wait until no data hazards, then read operands

### A Dynamic Algorithm: Tomasulo's Algorithm

- For IBM 360/91 (before caches!)
- · Goal: High Performance without special compilers
- Small number of floating point registers (4 in 360)
   prevented interesting compiler scheduling of operations
  - This led Tomasulo to try to figure out how to get more effective registers — renaming in hardware!
- Why Study 1966 Computer?
- The descendants of this have flourished!
  - Alpha 21264, HP 8000, MIPS 10000, Pentium III, PowerPC 604, ...

### Tomasulo Algorithm

- Control & buffers <u>distributed</u> with Function Units (FU)
  - FU buffers called "<u>reservation stations</u>"; have pending operands
- Registers in instructions replaced by values or pointers to reservation stations(RS); called <u>register</u> <u>renaming</u>;
  - avoids WAR, WAW hazards
  - More reservation stations than registers, so can do optimizations compilers can't
- Results to FU from RS, not through registers, over Common Data Bus that broadcasts results to all FUs
- Load and Stores treated as FUs with RSs as well
- Integer instructions can go past branches, allowing FP ops beyond basic block in FP queue

### Tomasulo Organization



Common Data Bus (CDB)

### Reservation Station Components

Op: Operation to perform in the unit (e.g., + or -)

Vj, Vk: Value of Source operands

- Store buffers has V field, result to be stored

Qj, Qk: Reservation stations producing source registers (value to be written)

- Note: Qj,Qk=0 => ready
- Store buffers only have Qi for RS producing result

Busy: Indicates reservation station or FU is busy

Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.

### Three Stages of Tomasulo Algorithm

1. Issue—get instruction from FP Op Queue

If reservation station free (no structural hazard), control issues instr & sends operands (renames registers).

2. Execute—operate on operands (EX)

When both operands ready then execute; if not ready, watch Common Data Bus for result

3. Write result—finish execution (WB)

Write on Common Data Bus to all awaiting units; mark reservation station available

- Normal data bus: data + destination ("go to" bus)
- Common data bus: data + source ("come from" bus)
  - 64 bits of data + 4 bits of Functional Unit source address
  - Write if matches expected Functional Unit (produces result)
  - Does the broadcast
- Example speed:
   3 clocks for Fl .pt. +,-; 10 for \*; 40 clks for /

#### Instruction stream

### Tomasulo Example



#### Register result status:



Clock cycle counter

*S*2

RS

Exec Write







#### Reservation Stations:



SI

#### Register result status:



RS



Note: Can have multiple loads outstanding



 Note: registers names are removed ("renamed") in Reservation Stations; MULT issued

3/16/01 Load1 completing; what is waiting for Load1?



Load2 completing; what is waiting for Load2?



M(A1)

Add1

Mult2

M(A2)

· Timer starts down for Add1, Mult1

Mult1

FU



Issue ADDD here despite name dependency on F6?

6

```
Instruction status:
                                  Exec Write
                                  Comp Result
                                                           Busy Address
   Instruction
                        k
                            Issue
   LD
            F6
                 34+
                       R2
                                    3
                                                            No
                                           4
                                                     Load1
   LD
            F2
                 45 +
                       R3
                                           5
                                                    Load2
                                                            No
                                                     Load3
   MULTD
            FO
                 F2
                       F4
                                                            No
   SUBD
            F8
                  F6
                       F2
                              4
   DIVD
            F10
                              5
                  F0
                       F6
            F6
   ADDD
                  F8
                       F2
                              6
Reservation Stations:
                                          S2
                                   SI
                                                RS
                                                      RS
                                                      Ok
           Time Name Busy
                            Op
                                    Vi
                                          Vk
                      Yes SUBD M(A1) M(A2)
              0 \text{ Add} 1
                Add2
                       Yes ADDD
                                        M(A2) Add1
                Add3
                       No
                      Yes MULTD M(A2) R(F4)
              8 Mult1
                Mult2
                           DIVD
                                        M(A1) Mult1
                      Yes
Register result status:
                                   F2
                                         F4
                                                      F8
                                                           F10
                                                                   F12
                                                                               F30
   Clock
                             FO
                                                F6
```

Add2

Add1

Mult2

Add1 (SUBD) completing; what is waiting for it?

M(A2)

Mult1

FU

```
Instruction status:
                                 Exec Write
                                                          Busy Address
                                 Comp Result
                           Issue
   Instruction
                       k
                                   3
   LD
            F6
                 34 +
                       R2
                                          4
                                                    Load1
                                                           No
   LD
            F2
                 45+
                       R3
                                   4
                                          5
                                                   Load2
                                                           No
   MULTD
            FO
                 F2
                       F4
                                                   Load3
                                                           No
   SUBD
                       F2
                             4
            F8
                 F6
                                          8
   DIVD
           F10
                 F0
                       F6
                             5
   ADDD
            F6
                 F8
                       F2
                             6
Reservation Stations:
                                   SI
                                         S2
                                               RS
                                                     RS
                                         Vk
                                                     Ok
          Time Name Busy
                            Op
                                               O_i
                Add1
                      No
                      Yes ADDD (M-M)
              2 Add2
                Add3
                      No
              7 Mult1
                      Yes MULTD M(A2) R(F4)
                Mult2
                      Yes
                           DIVD
                                       M(A1) Mult1
Register result status:
   Clock
                            F0
                                  F2
                                        F4
                                                     F8
                                                         F10
                                                                  F12
                                                                              F30
                                               F6
```

Add2

Mult2

M(A2)

FU

Mult1

8

```
Instruction status:
                                  Exec Write
                           Issue Comp Result
                                                           Busy Address
   Instruction
                        k
                                    3
   LD
            F6
                 34 +
                       R2
                                          4
                                                     Load1
                                                             No
   LD
            F2
                 45+
                       R3
                                    4
                                           5
                                                     Load2
                                                             No
   MULTD
            FO
                 F2
                       F4
                                                     Load3
                                                            No
   SUBD
                       F2
                              4
            F8
                  F6
                                           8
   DIVD
            F10
                  F0
                       F6
                              5
   ADDD
            F6
                  F8
                       F2
                              6
Reservation Stations:
                                          S2
                                   SI
                                                RS
                                                      RS
                                          Vk
                                                      Ok
           Time Name Busy
                            Op
                                                O_i
                Add1
                       No
                       Yes ADDD (M-M) M(A2)
              1 Add2
                Add3
                       No
```

Yes MULTD M(A2) R(F4)

**DIVD** 

#### Register result status:

6 Mult1

Mult2

Yes

M(A1) Mult1

| Instruction status:   |                         |                                                                                            |       | Exec                                                                                                                                                    | Write                                                                                                                                   |                                                                                                                                                     |       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                              |
|-----------------------|-------------------------|--------------------------------------------------------------------------------------------|-------|---------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|-------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| n                     | j                       | $\boldsymbol{k}$                                                                           | Issue | Comp                                                                                                                                                    | Result                                                                                                                                  |                                                                                                                                                     |       | Busy                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Address                                                                                                                                                                                                                                                                                      |
| F6                    | 34+                     | R2                                                                                         | 1     | 3                                                                                                                                                       | 4                                                                                                                                       |                                                                                                                                                     | Load1 | No                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                              |
| F2                    | 45+                     | R3                                                                                         | 2     | 4                                                                                                                                                       | 5                                                                                                                                       |                                                                                                                                                     | Load2 | No                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                              |
| F0                    | F2                      | F4                                                                                         | 3     |                                                                                                                                                         |                                                                                                                                         |                                                                                                                                                     | Load3 | No                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                              |
| F8                    | F6                      | F2                                                                                         | 4     | 7                                                                                                                                                       | 8                                                                                                                                       |                                                                                                                                                     |       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                              |
| F10                   | FO                      | F6                                                                                         | 5     |                                                                                                                                                         |                                                                                                                                         |                                                                                                                                                     |       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                              |
| F6                    | F8                      | F2                                                                                         | 6     | 10                                                                                                                                                      |                                                                                                                                         |                                                                                                                                                     |       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                              |
| Reservation Stations: |                         |                                                                                            |       |                                                                                                                                                         | <i>S</i> 2                                                                                                                              | RS                                                                                                                                                  | RS    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                              |
| Time                  | Name                    | Busy                                                                                       | Op    | Vj                                                                                                                                                      | Vk                                                                                                                                      | Qj                                                                                                                                                  | Qk    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                              |
|                       | Add1                    | No                                                                                         |       |                                                                                                                                                         |                                                                                                                                         |                                                                                                                                                     |       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                              |
| 0 Add2   Yes ADDD     |                         |                                                                                            |       | (M-M)                                                                                                                                                   | M(A2)                                                                                                                                   |                                                                                                                                                     |       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                              |
|                       | Add3                    | No                                                                                         |       |                                                                                                                                                         |                                                                                                                                         |                                                                                                                                                     |       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                              |
| 5 Mult1 Yes MULTI     |                         |                                                                                            |       | M(A2)                                                                                                                                                   | R(F4)                                                                                                                                   |                                                                                                                                                     |       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                              |
|                       | Mult2                   | Yes                                                                                        | DIVD  |                                                                                                                                                         | M(A1)                                                                                                                                   | Mult1                                                                                                                                               |       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                              |
|                       | F6 F2 F0 F8 F10 F6 Time | n j F6 34+ F2 45+ F0 F2 F8 F6 F10 F0 F6 F8  On Stations Time Name Add1 0 Add2 Add3 5 Mult1 | n     | F6 34+ R2 1 F2 45+ R3 2 F0 F2 F4 3 F8 F6 F2 4 F10 F0 F6 5 F6 F8 F2 6  PA Stations:  Time Name Busy Op Add1 No 0 Add2 Yes ADDD Add3 No 5 Mult1 Yes MULTE | F6   34+   R2   1   3   F2   45+   R3   2   4   4   F0   F2   F4   3   F8   F6   F2   4   7   F10   F0   F6   5   F6   F8   F2   6   10 | F6   34+   R2   1   3   4   F2   45+   R3   2   2   4   5   F0   F2   F4   3   F8   F6   F2   4   7   8   F10   F0   F6   5   F6   F8   F2   6   10 |       | F6   34+   R2   1   3   4   Load1   F2   45+   R3   2   4   5   Load2   F0   F2   F4   3   Load3   F8   F6   F2   4   7   8   F10   F0   F6   5   F6   F8   F2   6   10     Stations:   S1   S2   RS   RS   Time   Name   Busy   Op   Vj   Vk   Qj   Qk   Add1   No   O   Add2   Yes   ADDD   (M-M)   M(A2)   Add3   No   5   Mult1   Yes   MULTD   M(A2)   R(F4)     Madd1   No   Mult1   Yes   MULTD   M(A2)   R(F4)     Mult1   Multiple   Multiple   Mult1   Multiple   Multiple | Stations:   SI   S2   RS   RS   RS   Time   Name   Busy   Op   Vj   Vk   Qj   Qk   Add1   No   Yes   MULTD M(A2)   R(F4)   No   Result   Busy   No   Load2   No   No   No   No   No   No   No   N |

#### Register result status:

· Add2 (ADDD) completing; what is waiting for it?

| Instruction status:   |      |       |                  | Exec  | Write |            |       |       |      |         |
|-----------------------|------|-------|------------------|-------|-------|------------|-------|-------|------|---------|
| Instruction           | on   | j     | $\boldsymbol{k}$ | Issue | Comp  | Result     |       |       | Busy | Address |
| LD                    | F6   | 34+   | R2               | 1     | 3     | 4          |       | Load1 | No   |         |
| LD                    | F2   | 45+   | R3               | 2     | 4     | 5          |       | Load2 | No   |         |
| MULTD                 | FO   | F2    | F4               | 3     |       |            |       | Load3 | No   |         |
| SUBD                  | F8   | F6    | F2               | 4     | 7     | 8          |       |       |      |         |
| DIVD                  | F10  | FO    | F6               | 5     |       |            |       |       |      |         |
| ADDD                  | F6   | F8    | F2               | 6     | 10    | 11         |       |       |      |         |
| Reservation Stations: |      |       |                  |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|                       | Time | Name  | Busy             | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|                       |      | Add1  | No               |       |       |            |       |       |      |         |
|                       |      | Add2  | No               |       |       |            |       |       |      |         |
|                       |      | Add3  | No               |       |       |            |       |       |      |         |
| 4 Mult1 Yes MULTD     |      |       |                  | M(A2) | R(F4) |            |       |       |      |         |
|                       |      | Mult2 | Yes              | DIVD  |       | M(A1)      | Mult1 |       |      |         |

#### Register result status:



- Write result of ADDD here?
- · All quick instructions complete in this cycle!

```
Instruction status:
                                  Exec Write
                            Issue Comp Result
                                                           Busy Address
   Instruction
                        k
                                    3
   LD
            F6
                 34 +
                       R2
                                           4
                                                     Load1
                                                             No
   LD
            F2
                 45+
                       R3
                                    4
                                           5
                                                     Load2
                                                             No
   MULTD
            FO
                 F2
                       F4
                                                     Load3
                                                             No
   SUBD
                       F2
                              4
            F8
                  F6
                                           8
   DIVD
            F10
                  F0
                       F6
                              5
                  F8
   ADDD
            F6
                       F2
                                    10
                                          11
                              6
Reservation Stations:
                                          S2
                                    SI
                                                RS
                                                      RS
                                          Vk
                                                      Ok
           Time Name Busy
                             Op
                                                O_i
                Add1
                       No
                Add2
                       No
                Add3
                       No
              3 Mult1
                       Yes MULTD M(A2) R(F4)
                Mult2
                       Yes
                            DIVD
                                        M(A1) Mult1
```

#### Register result status:



```
Instruction status:
                                  Exec Write
                            Issue Comp Result
                                                           Busy Address
   Instruction
                        k
                                    3
   LD
            F6
                 34 +
                       R2
                                           4
                                                     Load1
                                                             No
   LD
            F2
                 45+
                       R3
                                    4
                                           5
                                                     Load2
                                                             No
   MULTD
            FO
                 F2
                       F4
                                                     Load3
                                                             No
   SUBD
                       F2
                              4
            F8
                  F6
                                           8
   DIVD
            F10
                  F0
                       F6
                              5
                  F8
   ADDD
            F6
                       F2
                                    10
                                          11
                              6
Reservation Stations:
                                          S2
                                    SI
                                                RS
                                                      RS
                                          Vk
                                                      Ok
           Time Name Busy
                             Op
                                                O_i
                Add1
                       No
                Add2
                       No
                Add3
                       No
              2 Mult1
                       Yes MULTD M(A2) R(F4)
                Mult2
                       Yes
                            DIVD
                                        M(A1) Mult1
```

#### Register result status:

Clock F0F2*F4 F*6 F8F10 F12 *F30* M(A2)**13** FUMult1 (M-M+N(M-M))Mult2

```
Instruction status:
                                  Exec Write
                            Issue Comp Result
                                                           Busy Address
   Instruction
                        k
                                    3
   LD
            F6
                 34 +
                       R2
                                           4
                                                     Load1
                                                             No
   LD
            F2
                 45+
                       R3
                                    4
                                           5
                                                     Load2
                                                             No
   MULTD
            FO
                  F2
                       F4
                                                     Load3
                                                             No
   SUBD
                       F2
                              4
            F8
                  F6
                                           8
   DIVD
            F10
                  F0
                       F6
                              5
                  F8
   ADDD
            F6
                       F2
                                    10
                                          11
                              6
Reservation Stations:
                                          S2
                                    SI
                                                RS
                                                      RS
                                          Vk
                                                      Ok
           Time Name Busy
                             Op
                                                O_i
                Add1
                       No
                Add2
                       No
                Add3
                       No
               1 Mult1
                       Yes MULTD M(A2) R(F4)
                Mult2
                       Yes
                            DIVD
                                        M(A1) Mult1
```

### Register result status:



| Instructio  | n sta | tus:   |            |       | Exec  | Write      |       |       |      |         |
|-------------|-------|--------|------------|-------|-------|------------|-------|-------|------|---------|
| Instruction | on    | j      | k          | Issue | Comp  | Result     |       |       | Busy | Address |
| LD          | F6    | 34+    | R2         | 1     | 3     | 4          |       | Load1 | No   |         |
| LD          | F2    | 45+    | <b>R</b> 3 | 2     | 4     | 5          |       | Load2 | No   |         |
| MULTD       | FO    | F2     | F4         | 3     | 15    |            |       | Load3 | No   |         |
| SUBD        | F8    | F6     | F2         | 4     | 7     | 8          |       |       |      |         |
| DIVD        | F10   | FO     | F6         | 5     |       |            |       |       |      |         |
| ADDD        | F6    | F8     | F2         | 6     | 10    | 11         |       |       |      |         |
| Reservation | on St | ations | s:         |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|             | Time  | Name   | Busy       | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|             |       | Add1   | No         |       |       |            |       |       |      |         |
|             |       | Add2   | No         |       |       |            |       |       |      |         |
|             |       | Add3   | No         |       |       |            |       |       |      |         |
|             | C     | Mult1  | Yes        | MULTE | M(A2) | R(F4)      |       |       |      |         |
|             |       | Mult2  | Yes        | DIVD  |       | M(A1)      | Mult1 |       |      |         |
| D           | 7 .   |        |            |       |       |            |       |       |      |         |

### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30 15 FU Mult1 M(A2) (M-M+V(M-M)) Mult2

· Mult1 (MULTD) completing; what is waiting for it?

| Instructio  | n sta | tus:   |           |       | Exec       | Write     |           |       |      |         |     |     |
|-------------|-------|--------|-----------|-------|------------|-----------|-----------|-------|------|---------|-----|-----|
| Instruction | on    | j      | k         | Issue | Comp       | Result    |           |       | Busy | Address | _   |     |
| LD          | F6    | 34+    | R2        | 1     | 3          | 4         |           | Load1 | No   |         |     |     |
| LD          | F2    | 45+    | R3        | 2     | 4          | 5         |           | Load2 | No   |         |     |     |
| MULTD       | FO    | F2     | F4        | 3     | 15         | 16        |           | Load3 | No   |         |     |     |
| SUBD        | F8    | F6     | F2        | 4     | 7          | 8         |           |       |      |         |     |     |
| DIVD        | F10   | FO     | <b>F6</b> | 5     |            |           |           |       |      |         |     |     |
| ADDD        | F6    | F8     | F2        | 6     | 10         | 11        |           |       |      |         |     |     |
| Reservati   | on St | ations | 5.        |       | S1         | <i>S2</i> | RS        | RS    |      |         |     |     |
|             | Time  | Name   | Busy      | Op    | Vj         | Vk        | Qj        | Qk    | _    |         |     |     |
|             |       | Add1   | No        |       |            |           |           |       |      |         |     |     |
|             |       | Add2   | No        |       |            |           |           |       |      |         |     |     |
|             |       | Add3   | No        |       |            |           |           |       |      |         |     |     |
|             |       | Mult1  | No        |       |            |           |           |       |      |         |     |     |
|             | 40    | Mult2  | Yes       | DIVD  | M*F4       | M(A1)     |           |       |      |         |     |     |
| Register n  | esult | statu  | s:        |       |            |           |           |       |      |         |     |     |
| Clock       |       |        | ,         | F0    | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8    | F10  | F12     | ••• | F30 |

(M-M+N(M-M) Mult2

· Just waiting for Mult2 (DIVD) to complete

M\*F4 M(A2)

**16** 

# Faster than light computation (skip a couple of cycles)

```
Instruction status:
                                  Exec Write
                                                            Busy Address
                                  Comp Result
                            Issue
   Instruction
                        k
                                     3
   LD
            F6
                 34 +
                       R2
                                           4
                                                     Load1
                                                             No
   LD
            F2
                 45+
                       R3
                                     4
                                           5
                                                     Load2
                                                             No
   MULTD
            FO
                  F2
                       F4
                                    15
                                           16
                                                     Load3
                                                             No
   SUBD
                       F2
                                           8
            F8
                  F6
                              4
   DIVD
            F10
                  F0
                       F6
                              5
                  F8
   ADDD
            F6
                       F2
                                    10
                                           11
                              6
Reservation Stations:
                                          S2
                                    SI
                                                RS
                                                       RS
                                          Vk
                                                       Ok
           Time Name Busy
                             Op
                                                 O_i
                Add1
                       No
                Add2
                       No
                Add3
                       No
                Mult1
                       No
                                  M*F4 M(A1)
               1 Mult2
                       Yes
                            DIVD
```

### Register result status:



| Instructio  | n sta | tus:   |      |       | Exec | Write      |    |       |      |         |
|-------------|-------|--------|------|-------|------|------------|----|-------|------|---------|
| Instruction | on    | j      | k    | Issue | Comp | Result     |    |       | Busy | Address |
| LD          | F6    | 34+    | R2   | 1     | 3    | 4          |    | Load1 | No   |         |
| LD          | F2    | 45+    | R3   | 2     | 4    | 5          |    | Load2 | No   |         |
| MULTD       | F0    | F2     | F4   | 3     | 15   | 16         |    | Load3 | No   |         |
| SUBD        | F8    | F6     | F2   | 4     | 7    | 8          |    |       |      |         |
| DIVD        | F10   | F0     | F6   | 5     | 56   |            |    |       |      |         |
| ADDD        | F6    | F8     | F2   | 6     | 10   | 11         |    |       |      |         |
| Reservation | on St | ations | s:   |       | S1   | <i>S</i> 2 | RS | RS    |      |         |
|             | Time  | Name   | Busy | Op    | Vj   | Vk         | Qj | Qk    |      |         |
|             |       | Add1   | No   |       |      |            |    |       |      |         |
|             |       | Add2   | No   |       |      |            |    |       |      |         |
|             |       | Add3   | No   |       |      |            |    |       |      |         |
|             |       | Mult1  | No   |       |      |            |    |       |      |         |
|             | C     | Mult2  | Yes  | DIVD  | M*F4 | M(A1)      |    |       |      |         |
| ъ.          | •     |        |      |       |      |            |    |       | _    |         |

### Register result status:

Mult2 (DIVD) is completing; what is waiting for it?



Register result status:

Lec 16 43

Once again: In-order issue, out-of-order execution and out-of-order completion. CS252/Patterson

### Tomasulo Drawbacks

- Complexity
  - delays of 360/91, MIPS 10000, Alpha 21264, IBM PPC 620 in CA:AQA 2/e, but not in silicon!
- Many associative stores (CDB) at high speed
- Performance limited by Common Data Bus
  - Each CDB must go to multiple functional units
     shigh capacitance, high wiring density
  - Number of functional units that can complete per cycle limited to one!
    - » Multiple CDBs ⇒ more FU logic for parallel assoc stores
- Non-precise interrupts!
  - We will address this later

### Tomasulo Loop Example

| Loop:LI |            | FO | 0    | R1 |
|---------|------------|----|------|----|
| JΜ      | JLTD       | F4 | F0   | F2 |
| SI      |            | F4 | 0    | R1 |
| St      | JBI        | R1 | R1   | #8 |
| BN      | <b>IEZ</b> | R1 | Loop |    |

- This time assume Multiply takes 4 clocks
- Assume 1st load takes 8 clocks (L1 cache miss), 2nd load takes 1 clock (hit)
- To be clear, will show clocks for SUBI, BNEZ
  - Reality: integer instructions ahead of Fl. Pt. Instructions
- Show 2 iterations

# Loop Example



Value of Register used for address, iteration control







Implicit renaming sets up data flow graph

| Instructi | on statu  | <i>s</i> : |           |                  |           | Exec       | Write  |             |            |            |       |
|-----------|-----------|------------|-----------|------------------|-----------|------------|--------|-------------|------------|------------|-------|
| ITER      | Instruct  | ion        | $\dot{j}$ | $\boldsymbol{k}$ | Issue     | Comp       | Result |             | Busy       | Addr       | Fu    |
| 1         | LD        | F0         | 0         | R1               | 1         |            |        | Load1       | Yes        | 80         |       |
| 1         | MULTD     | F4         | F0        | F2               | 2         |            |        | Load2       | No         |            |       |
| 1         | SD        | F4         | 0         | <b>R</b> 1       | 3         |            |        | Load3       | No         |            |       |
|           |           |            |           |                  |           |            |        | Store1      | Yes        | 80         | Mult1 |
|           |           |            |           |                  |           |            |        | Store2      | No         |            |       |
|           |           |            |           |                  |           |            |        | Store3      | No         |            |       |
| Reservat  | tion Stat | ions:      |           |                  | S1        | <i>S</i> 2 | RS     |             |            |            |       |
| Time      | Name      | Busy       | Op        | Vj               | Vk        | Qj         | Qk     | Code:       |            |            |       |
|           | Add1      | No         |           |                  |           |            |        | LD          | F0         | 0          | R1    |
|           | Add2      | No         |           |                  |           |            |        | MULTD       | F4         | F0         | F2    |
|           | Add3      | No         |           |                  |           |            |        | SD          | F4         | 0          | R1    |
|           | Mult1     | Yes        | Multd     |                  | R(F2)     | Load1      |        | <b>SUBI</b> | <b>R</b> 1 | <b>R</b> 1 | #8    |
|           | Mult2     | No         |           |                  |           |            |        | <b>BNEZ</b> | R1         | Loop       |       |
| Register  | result si | tatus      |           |                  |           |            |        |             |            |            |       |
| Clock     | R1        |            | F0        | <i>F</i> 2       | <i>F4</i> | <i>F6</i>  | F8     | F10         | F12        | •••        | F30   |
| 4         | 80        | Fu         | Load1     |                  | Mult1     |            |        |             |            |            |       |

• Dispatching SUBI Instruction (not in FP queue)

| Instructio | on statu | s:     |       |                  |           | Exec       | Write  |             |      |      |       |          |
|------------|----------|--------|-------|------------------|-----------|------------|--------|-------------|------|------|-------|----------|
| ITER :     | Instruct | ion    | j     | $\boldsymbol{k}$ | Issue     | Comp       | Result |             | Busy | Addr | Fu    |          |
| 1          | LD       | F0     | 0     | R1               | 1         |            |        | Load1       | Yes  | 80   |       |          |
| 1          | MULTD    | F4     | F0    | F2               | 2         |            |        | Load2       | No   |      |       |          |
| 1          | SD       | F4     | 0     | R1               | 3         |            |        | Load3       | No   |      |       |          |
|            |          |        |       |                  |           |            |        | Store1      | Yes  | 80   | Mult1 |          |
|            |          |        |       |                  |           |            |        | Store2      | No   |      |       |          |
|            |          |        |       |                  |           |            |        | Store3      | No   |      |       |          |
| Reservati  | ion Stat | tions: |       |                  | S1        | <i>S</i> 2 | RS     |             |      |      |       |          |
| Time       | Name     | Busy   | Op    | Vj               | Vk        | Qj         | Qk     | Code:       |      |      |       |          |
|            | Add1     | No     |       |                  |           |            |        | LD          | F0   | 0    | R1    |          |
|            | Add2     | No     |       |                  |           |            |        | MULTD       | F4   | F0   | F2    |          |
|            | Add3     | No     |       |                  |           |            |        | SD          | F4   | 0    | R1    |          |
|            | Mult1    | Yes    | Multd |                  | R(F2)     | Load1      |        | SUBI        | R1   | R1   | #8    |          |
|            | Mult2    | No     |       |                  |           |            |        | <b>BNEZ</b> | R1   | Loop |       | <b>—</b> |
| Register   | result s | tatus  |       |                  |           |            |        |             |      |      |       |          |
| Clock      | R1       |        | F0    | <i>F</i> 2       | <i>F4</i> | F6         | F8     | F10         | F12  | •••  | F30   | _        |
| 5          | 72       | Fu     | Load1 |                  | Mult1     |            |        |             |      |      |       |          |

And, BNEZ instruction (not in FP queue)



Notice that F0 never sees Load from location 80

| Instructi | on statu  | <i>s:</i> |       |            |           | Exec       | Write  |        |            |            |            |
|-----------|-----------|-----------|-------|------------|-----------|------------|--------|--------|------------|------------|------------|
| ITER      | Instructi | ion       | j     | k          | Issue     | Comp       | Result |        | Busy       | Addr       | Fu         |
| 1         | LD        | F0        | 0     | <b>R</b> 1 | 1         |            |        | Load1  | Yes        | 80         |            |
| 1         | MULTD     | F4        | F0    | F2         | 2         |            |        | Load2  | Yes        | 72         |            |
| 1         | SD        | F4        | 0     | <b>R</b> 1 | 3         |            |        | Load3  | No         |            |            |
| 2         | LD        | F0        | 0     | <b>R</b> 1 | 6         |            |        | Store1 | Yes        | 80         | Mult1      |
| 2         | MULTD     | F4        | FO    | F2         | 7         |            |        | Store2 | No         |            |            |
|           |           |           |       |            |           |            |        | Store3 | No         |            |            |
| Reservat  | tion Stat | ions:     |       |            | <i>S1</i> | <i>S</i> 2 | RS     |        |            |            |            |
| Time      | Name      | Busy      | Ор    | Vj         | Vk        | Qj         | Qk     | Code:  |            |            |            |
|           | Add1      | No        |       |            |           |            |        | LD     | F0         | 0          | <b>R</b> 1 |
|           | Add2      | No        |       |            |           |            |        | MULTD  | F4         | F0         | F2         |
|           | Add3      | No        |       |            |           |            |        | SD     | F4         | 0          | <b>R</b> 1 |
|           | Mult1     | Yes       | Multd |            | R(F2)     | Load1      |        | SUBI   | <b>R</b> 1 | <b>R</b> 1 | #8         |
|           | Mult2     | Yes       | Multd |            | R(F2)     | Load2      |        | BNEZ   | R1         | Loop       |            |
| Register  | result st | atus      |       |            |           |            |        |        |            |            |            |
| Clock     | R1        |           | F0    | <i>F</i> 2 | <i>F4</i> | <i>F6</i>  | F8     | F10    | F12        | •••        | F30        |
| 7         | 72        | Fu        | Load2 |            | Mult2     |            |        |        |            |            |            |

- Register file completely detached from computation
- First and Second iteration completely overlapped



| Instruction | on statu  | s:    |       |            |           | Exec       | Write  |             |            |            |            |
|-------------|-----------|-------|-------|------------|-----------|------------|--------|-------------|------------|------------|------------|
| ITER        | Instructi | ion   | j     | k          | Issue     | Comp       | Result |             | Busy       | Addr       | Fu         |
| 1           | LD        | F0    | 0     | R1         | 1         | 9          |        | Load1       | Yes        | 80         |            |
| 1           | MULTD     | F4    | F0    | F2         | 2         |            |        | Load2       | Yes        | 72         |            |
| 1           | SD        | F4    | 0     | <b>R</b> 1 | 3         |            |        | Load3       | No         |            |            |
| 2           | LD        | F0    | 0     | <b>R</b> 1 | 6         |            |        | Store1      | Yes        | 80         | Mult1      |
| 2           | MULTD     | F4    | FO    | F2         | 7         |            |        | Store2      | Yes        | 72         | Mult2      |
| 2           | SD        | F4    | 0     | R1         | 8         |            |        | Store3      | No         |            |            |
| Reservat    | ion Stat  | ions: |       |            | <i>S1</i> | <i>S</i> 2 | RS     |             |            |            |            |
| Time        | Name      | Busy  | Op    | Vj         | Vk        | Qj         | Qk     | Code:       |            |            |            |
|             | Add1      | No    |       |            |           |            |        | LD          | F0         | 0          | <b>R</b> 1 |
|             | Add2      | No    |       |            |           |            |        | MULTD       | F4         | F0         | F2         |
|             | Add3      | No    |       |            |           |            |        | SD          | F4         | 0          | R1         |
|             | Mult1     | Yes   | Multd |            | R(F2)     | Load1      |        | SUBI        | <b>R</b> 1 | <b>R</b> 1 | #8         |
|             | Mult2     | Yes   | Multd |            | R(F2)     | Load2      |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register    | result st | tatus |       |            |           |            |        |             |            |            |            |
| Clock       | R1        |       | F0    | <i>F</i> 2 | <i>F4</i> | <i>F6</i>  | F8     | F10         | F12        | •••        | F30        |
| 9           | 72        | Fu    | Load2 |            | Mult2     |            |        |             |            |            |            |

• Load1 completing: who is waiting?

3/16/01 Note: Dispatching SUBI



Load2 completing: who is waiting?

3/16/01 Note: Dispatching BNEZ



Next load in sequence

### Instruction status: Exec Write FuITER Instruction Issue CompResult Addr kBusy **R**1 9 LD F0 10 Load1 No **MULTD** F4 F2. F0 Load2 No SD F4 **R**1 Load3 Yes 64 10 11 80 Mult1 LD F0 **R**1 Store 1 Yes 0 6 **MULTD** 2 F4 F0 F2 Store2 Yes 72 Mult2 2 SD F4 0 **R**1 8 Store3 No Reservation Stations: SI *S*2 RS Time Busy $V_{j}$ Vk $Q_j$ Name OpOkCode: Add1 No LD F0 0 **R**1 **MULTD** Add2 F2 No F4 F0 Add3 SD F4 **R**1 No 0 Yes Multd M[80] R(F2) **SUBI R**1 **R**1 #8 2 Mult1 Mult2 Yes Multd M[72] R(F2) **BNEZ R**1 Loop Register result status Clock *F2* F4 *F6* F8 *F10 F12 F30* F0R1

Mult2

Why not issue third multiply?

Load3

Fu

12

64

### Instruction status: Exec Write FuITER Instruction kIssue CompResult Busy Addr **R**1 9 LD F0 10 Load1 No **MULTD** F4 F2. F0 Load2 No SD F4 **R**1 Load3 Yes 64 10 11 Yes 80 Mult1 LD F0 **R**1 Store 1 0 6 2 **MULTD** F4 F0 F2 Store2 Yes 72 Mult2 SD F4 0 **R**1 8 Store3 No Reservation Stations: SI *S*2 RS Time $V_{j}$ Vk $Q_j$ Name Busy OpOkCode: Add1 No LD F0 0 R1 **MULTD** Add2 F2 No F4 F0 Add3 SD F4 **R**1 No 0 Yes Multd M[80] R(F2) **SUBI R**1 **R**1 #8 Mult1 Mult2 Yes Multd M[72] R(F2) **BNEZ R**1 Loop Register result status Clock *F2* F4 *F6* F8 *F10 F12 F30* F0R1

Mult2

Why not issue third store?

Load3

Fu

13

64



Mult1 completing. Who is waiting?



Mult2 completing. Who is waiting?





### Instruction status: Exec Write Fu ITER Instruction kIssue CompResult Busy Addr LD 0 **R**1 9 No F0 10 Load1 **MULTD** F4 F2 15 Load2 F0 No 3 18 Load3 SD F4 0 **R**1 Yes 64 LD 11 80 [80]\*R2 F0 **R**1 10 Store 1 Yes 0 6 **MULTD** 15 Store2 2 F4 F0 F2 16 Yes 72 [72]\*R22 SD F4 0 **R**1 8 Store3 Yes 64 Mult1 Reservation Stations: SI *S*2 RS VkTime Name $V_{j}$ $Q_j$ *Ok* Code: Busy OpAdd1 No LD F0 0 R1 **MULTD** Add2 F2 No F4 F0 Add3 No SD F4 **R**1 0 Mult1 Yes Multd R(F2) Load3 **SUBI R**1 **R**1 #8 Mult2 No **BNEZ R**1 Loop Register result status F10 F12 Clock *F2* F4 *F6* F8 *F30* F0**R**1 18 64 Fu Load3 Mult1



### Register result status

| Clock | R1        |    | FO    | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8 | F10 | <i>F12</i> | ••• | F30 |
|-------|-----------|----|-------|------------|-----------|-----------|----|-----|------------|-----|-----|
| 19    | <b>56</b> | Fu | Load3 |            | Mult1     |           |    |     |            |     |     |



*F2* F8 *F12 F30* Clock F0F4 *F6 F10* R1Fu20 **56** Load1 Mult1

Once again: In-order issue, out-of-order execution and out-of-order completion. CS252/Patterson 3/16/01

Lec 16.66

# Why can Tomasulo overlap iterations of loops?

### Register renaming

 Multiple iterations use different physical destinations for registers (dynamic loop unrolling).

### Reservation stations

- Permit instruction issue to advance past integer control flow operations
- Also buffer old values of registers totally avoiding the WAR stall that we saw in the scoreboard.
- Other perspective: Tomasulo building data flow dependency graph on the fly.

# Tomasulo's scheme offers 2 major advantages

- (1) the distribution of the hazard detection logic
  - distributed reservation stations and the CDB
  - If multiple instructions waiting on single result, & each instruction has other operand, then instructions can be released simultaneously by broadcast on CDB
  - If a centralized register file were used, the units would have to read their results from the registers when register buses are available.
- (2) the elimination of stalls for WAW and WAR hazards

### What about Precise Interrupts?

Tomasulo had:

In-order issue, out-of-order execution, and out-of-order completion

 Need to "fix" the out-of-order completion aspect so that we can find precise breakpoint in instruction stream.

# Relationship between precise interrupts and specultation:

- Speculation is a form of guessing.
- Important for branch prediction:
  - Need to "take our best shot" at predicting branch direction.
- If we speculate and are wrong, need to back up and restart execution to point at which we predicted incorrectly:
  - This is exactly same as precise exceptions!
- Technique for both precise interrupts/exceptions and speculation: in-order completion or commit

### HW support for precise interrupts

 Need HW buffer for results of uncommitted instructions:

reorder buffer

- 3 fields: instr, destination, value
- Use reorder buffer number instead of reservation station when execution completes
- Supplies operands between execution complete & commit
- (Reorder buffer can be operand source => more registers like RS)
- Instructions commit
- Once instruction commits, result is put into register
- As a result, easy to undo speculated instructions on mispredicted branches or exceptions



# Four Steps of Speculative Tomasulo Algorithm

### 1. Issue—get instruction from FP Op Queue

If reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination (this stage sometimes called "dispatch")

### 2. Execution—operate on operands (EX)

When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called "issue")

### 3. Write result—finish execution (WB)

Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available.

### 4. Commit—update register with reorder result

When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called "graduation")

# What are the hardware complexities with reorder buffer (ROB)?



- How do you find the latest version of a register?
  - (As specified by Smith paper) need associative comparison network
  - Could use future file or just use the register result status buffer to track which specific reorder buffer has received the value
- Need as many ports on ROB as register file

### Summary

- Reservations stations: implicit register renaming to larger set of registers + buffering source operands
  - Prevents registers as bottleneck
  - Avoids WAR, WAW hazards of Scoreboard
  - Allows loop unrolling in HW
- Not limited to basic blocks (integer units gets ahead, beyond branches)
- Today, helps cache misses as well
  - Don't stall for L1 Data cache miss (insufficient ILP for L2 miss?)
- Lasting Contributions
  - Dynamic scheduling
  - Register renaming
  - Load/store disambiguation
- 360/91 descendants are Pentium III; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264