# Lecture-12 (Tomasulo's Algorithm) CS422-Spring 2018





# Another Dynamic One: Tomasulo's Algorithm

- For IBM 360/91 about 3 years after CDC 6600 (1966)
- Goal: High Performance without special compilers
- Differences between IBM 360 & CDC 6600 ISA
  - IBM has only 2 register specifiers/instruction vs. 3 in CDC 6600
  - IBM has 4 FP registers vs. 8 in CDC 6600
  - IBM has memory-register ops

• Why Study? lead to Alpha 21264, HP 8000, MIPS 10000, Pentium II, PowerPC 604, ...

### Tomasulo's Organization



#### Tomasulo vs Scoreboard

- Control & buffers distributed with Function Units (FU) vs. centralized in scoreboard;
  - FU buffers called "reservation stations"; have pending operands
- Registers in instructions replaced by values or pointers to reservation stations(RS); called register renaming;
  - avoids WAR, WAW hazards
  - More reservation stations than registers, so can do optimizations compilers can't
- Results to FU from RS, <u>not through registers</u>, over <u>Common Data Bus</u> that broadcasts results to all FUs
- Load and Stores treated as FUs with RSs as wells

### Reservation Station Components

Op: Operation to perform in the unit (e.g., + or −)

Vj, Vk: Value of Source operands

- Store buffers has V field, result to be stored

Qj, Qk: Reservation stations producing source registers (value to be written)

- Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready
- Store buffers only have Qi for RS producing result

Busy: Indicates reservation station or FU is busy

Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.

### Three Stages of Tomasulo Algorithm

- 1. Issue—get instruction from FP Op Queue

  If reservation station free (no structural hazard),
  control issues instr & sends operands (renames registers).
- 2. Execution—operate on operands (EX)

  When both operands ready then execute;
  if not ready, watch Common Data Bus for result
- 3. Write result—finish execution (WB)

  Write on Common Data Bus to all awaiting units;
  mark reservation station available
- Normal data bus: data + destination ("go to" bus)
- <u>Common data bus</u>: data + <u>source</u> ("<u>come from</u>" bus)
  - 64 bits of data + 4 bits of Functional Unit <u>source</u> address
  - Write if matches expected Functional Unit (produces result)
  - Does the broadcast

### An Example



#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30



Load: 2 cycle

FP add: 2 cycles

FP multiply: 10 cycles

FP divide: 40 cycles

#### Register result status:

Clock



#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Load2 Load1

#### Note: Unlike 6600, can have multiple loads outstanding



#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30

Mult1 Load2 Load1

- Note: registers names are removed ("renamed") in Reservation Stations;
   MULT issued vs. scoreboard
- · Load1 completing; what is waiting for Load1?

**CS422: Spring 2018** 

Biswabandan Panda, CSE@IITK



#### Register result status:

| Clock |    | F0    | F2    | F4 | <i>F6</i> | F8   | F10 | <i>F12</i> | ••• | F30 |
|-------|----|-------|-------|----|-----------|------|-----|------------|-----|-----|
| 4     | FU | Mult1 | Load2 |    | M(A1)     | Add1 |     |            |     |     |

#### Load2 completing; what is waiting for Load2?

CS422: Spring 2018



#### Register result status:

Clock F10F12F2*F4 F6* F8 *F30* FOMult1 M(A2)M(A1)5 FUAdd1 Mult2



#### Register result status:

| Clock |    | F0    | F2    | <i>F4</i> | <i>F6</i> | F8   | F10   | <i>F12</i> | ••• | F30 |
|-------|----|-------|-------|-----------|-----------|------|-------|------------|-----|-----|
| 6     | FU | Mult1 | M(A2) |           | Add2      | Add1 | Mult2 |            |     |     |

#### · Issue ADDD here vs. scoreboard?

| Instructio  | n stai | tus:   |      |       | Exec  | Write      |       |       |      |         |
|-------------|--------|--------|------|-------|-------|------------|-------|-------|------|---------|
| Instruction | on     | j      | k    | Issue | Comp  | Result     |       |       | Busy | Address |
| LD          | F6     | 34+    | R2   | 1     | 3     | 4          |       | Load1 | No   |         |
| LD          | F2     | 45+    | R3   | 2     | 4     | 5          |       | Load2 | No   |         |
| MULTD       | FO     | F2     | F4   | 3     |       |            |       | Load3 | No   |         |
| SUBD        | F8     | F6     | F2   | 4     | 7     |            |       |       |      |         |
| DIVD        | F10    | F0     | F6   | 5     |       |            |       |       |      |         |
| ADDD        | F6     | F8     | F2   | 6     |       |            |       |       |      |         |
| Reservation | on St  | ations | s:   |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|             | Time   | Name   | Busy | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|             | 0      | Add1   | Yes  | SUBD  | M(A1) | M(A2)      |       |       |      |         |
|             |        | Add2   | Yes  | ADDD  |       | M(A2)      | Add1  |       |      |         |
|             |        | Add3   | No   |       |       |            |       |       |      |         |
|             | 8      | Mult1  | Yes  | MULTD | M(A2) | R(F4)      |       |       |      |         |
|             |        | Mult2  | Yes  | DIVD  |       | M(A1)      | Mult1 |       |      |         |

#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30 FU Mult1 M(A2) Add2 Add1 Mult2

#### · Add1 completing; what is waiting for it?

| Instructio  | n sta | tus:   |                  |       | Exec  | Write      |       |       |      |         |
|-------------|-------|--------|------------------|-------|-------|------------|-------|-------|------|---------|
| Instruction | on    | j      | $\boldsymbol{k}$ | Issue | Comp  | Result     |       |       | Busy | Address |
| LD          | F6    | 34+    | R2               | 1     | 3     | 4          |       | Load1 | No   |         |
| LD          | F2    | 45+    | <b>R</b> 3       | 2     | 4     | 5          |       | Load2 | No   |         |
| MULTD       | F0    | F2     | F4               | 3     |       |            |       | Load3 | No   |         |
| SUBD        | F8    | F6     | F2               | 4     | 7     | 8          |       |       |      |         |
| DIVD        | F10   | FO     | F6               | 5     |       |            |       |       |      |         |
| ADDD        | F6    | F8     | F2               | 6     |       |            |       |       |      |         |
| Reservatio  | on St | ations | 5.               |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|             | Time  | Name   | Busy             | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|             |       | Add1   | No               |       |       |            |       |       |      |         |
|             | 2     | Add2   | Yes              | ADDD  | (M-M) | M(A2)      |       |       |      |         |
|             |       | Add3   | No               |       |       |            |       |       |      |         |
|             | 7     | Mult1  | Yes              | MULTE | M(A2) | R(F4)      |       |       |      |         |
|             |       | Mult2  | Yes              | DIVD  |       | M(A1)      | Mult1 |       |      |         |

#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30

8 FU Mult1 M(A2) Add2 (M-M) Mult2



#### Register result status:

```
Instruction status:
                                 Exec Write
                                 Comp Result
                                                          Busy Address
   Instruction
                       k
                           Issue
   LD
            F6
                 34 +
                      R2
                                    3
                                                    Load1
                                                            No
                                          4
            F2
                 45+
                       R3
   LD
                                                    Load2
                                          5
                                                            No
   MULTD
            F0
                 F2
                       F4
                                                    Load3
                                                            No
   SUBD
                 F6
                                          8
   DIVD
           F10
                 F0
                      F6
   ADDD
            F6
                 F8
                       F2
                                   10
Reservation Stations:
                                   SI
                                         S2
                                               RS
                                                     RS
                                   V_i
                                         Vk
                                               Qj
          Time Name Busy
                            Op
                                                     Qk
                Add1
                      No
              0 Add2
                      Yes ADDD (M-M) M(A2)
                Add3
                      No
              5 Mult1
                      Yes MULTDM(A2) R(F4)
                      Yes
                                       M(A1) Mult1
                Mult2
                           DIVD
```

#### Register result status:

Clock F10 *F12* F2F8 F30 FOF4 F6 Mult1 M(A2)**10** FUAdd2 (M-M)Mult2

#### · Add2 completing; what is waiting for it?



#### Register result status:

- · Write result of ADDD here vs. scoreboard?
- · All quick instructions complete in this cycle!

CS422: Spring 2018







FU



FU



#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30 16 FU M\*F4 M(A2) (M-M+N (M-M) Mult2



#### Register result status:



#### Register result status:

· Mult2 is completing; what is waiting for it?

CS422: Spring 2018



#### Register result status:

· Once again: In-order issue, out-of-order execution and completion.

| istruction status: |     |           |    |         |    |   | Read | Exec | 1   | Write |   |
|--------------------|-----|-----------|----|---------|----|---|------|------|-----|-------|---|
| Instruction        |     | $\dot{J}$ | k  | k Issue |    | ( | Oper | Comp | ) ] | Resul | t |
| LD                 | F6  | 34+       | R2 |         | 1  |   | 2    | 3    |     | 4     |   |
| LD                 | F2  | 45+       | R3 |         | 5  |   | 6    | 7    |     | 8     |   |
| MULTD              | F0  | F2        | F4 |         | 6  |   | 9    | 19   |     | 20    |   |
| SUBD               | F8  | F6        | F2 |         | 7  |   | 9    | 11   |     | 12    |   |
| DIVD               | F10 | F0        | F6 |         | 8  |   | 21   | 61   |     | 62    |   |
| ADDD               | F6  | F8        | F2 | I       | 13 |   | 14   | 16   |     | 22    |   |

|       | Exec  | Writ   | e   |
|-------|-------|--------|-----|
| Issue | e Com | p.Resu | ılt |
| 1     | 3     | 4      |     |
| 2     | 4     | 5      |     |
| 3     | 15    | 16     |     |
| 4     | 7     | 8      |     |
| 5     | 56    | 57     |     |
| 6     | 10    | 11     |     |

#### Tomasulo vs Scoreboard

Pipelined Functional Units

(6 load, 3 store,  $3 + 2x/\div$ )

window size: ≤ 14 instructions

No issue on structural hazard

WAR: renaming avoids

WAW: renaming avoids

Broadcast results from FU

Control: reservation stations

Multiple Functional Units

(1 load/store, 1 + , 2 x, 1 ÷)

≤ 5 instructions

same

stall completion

stall issue

Write/read registers

central scoreboard