## **Advanced Computer Architecture**

### **Dynamic Scheduling**

曹强 计算机学院存储所 武汉光电国家研究中心

### **Recap: In-Order Commit for Precise Traps**



- In-order instruction fetch and decode, and dispatch to reservation stations inside reorder buffer
- Instructions issue from reservation stations out-of-order
- Out-of-order completion, values stored in temporary buffers
- Commit is in-order, checks for traps, and if none updates architectural state

### **Phases of Instruction Execution**



### Scoreboard: a bookkeeping technique

- Out-of-order execution divides ID stage:
  - 1. Issue—decode instructions, check for structural hazards
  - 2. Read operands—wait until no data hazards, then read operands
- Scoreboards date to CDC6600 in 1963
  - Readings for Monday include one on CDC6600
- Instructions execute whenever not dependent on previous instructions and no hazards.
- CDC 6600: In order issue, out-of-order execution, outof-order commit (or completion)
  - No forwarding!
  - Imprecise interrupt/exception model for now

### **Scoreboard Architecture (CDC 6600)**



2/08/2012

### **Scoreboard Implications**

- Out-of-order completion => WAR, WAW hazards?
- Solutions for WAR:
  - Stall writeback until registers have been read
  - Read registers only during Read Operands stage
- Solution for WAW:
  - Detect hazard and stall issue of new instruction until other instruction completes
- No register renaming
- Need to have multiple instructions in execution phase => multiple execution units or pipelined execution units
- Scoreboard keeps track of dependencies between instructions that have already issued
- Scoreboard replaces ID, EX, WB with 4 stages

### Four Stages of Scoreboard Control

- Issue—decode instructions & check for structural hazards (ID1)
  - Instructions issued in program order (for hazard checking)
  - Don't issue if structural hazard
  - Don't issue if instruction is output dependent on any previously issued but uncompleted instruction (no WAW hazards)
- Read operands—wait until no data hazards, then read operands (ID2)
  - All real dependencies (RAW hazards) resolved in this stage, since we wait for instructions to write back data.
  - No forwarding of data in this model!

### Four Stages of Scoreboard Control

- Execution—operate on operands (EX)
  - The functional unit begins execution upon receiving operands.
     When the result is ready, it notifies the scoreboard that it has completed execution.
- Write result—finish execution (WB)
  - Stall until no WAR hazards with previous instructions:

```
Example: DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F8,F8,F14
```

CDC 6600 scoreboard would stall SUBD until ADDD reads operands

### Three Parts of the Scoreboard

Instruction status:
 Which of 4 steps the instruction is in

 Functional unit status:—Indicates the state of the functional unit (FU). 9 fields for each functional unit

**Busy:** Indicates whether the unit is busy or not

Op: Operation to perform in the unit (e.g., + or –)

Fi: Destination register

Fj,Fk: Source-register numbers

Qj,Qk: Functional units producing source registers Fj, Fk

Rj,Rk: Flags indicating when Fj, Fk are ready

 Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register

### **Scoreboard Example**

```
Instruction status:
                              Read Exec Write
                        Issue Oper Comp Result
   Instruction
                34 + R2
   LD
           F6
           F2
                45+ R3
   LD
   MULTD
           F0
                   F4
   SUBD
           F8
                F6
                    F2
   DIVD
           F10
                F0
                    F6
   ADDD
           F6
                F8
                    F2
```

#### Functional unit status:

| Time Name | Busy | Op | Fi | Fj | Fk | Qj | Qk | Řj | Rk |
|-----------|------|----|----|----|----|----|----|----|----|
| Integer   | No   |    |    |    |    |    |    |    |    |
| Mult1     | No   |    |    |    |    |    |    |    |    |
| Mult2     | No   |    |    |    |    |    |    |    |    |
| Add       | No   |    |    |    |    |    |    |    |    |
| Divide    | No   |    |    |    |    |    |    |    |    |

SI

*S*2

FU

FU

Fi?

Fk?

#### Register result status:

dest

## **Detailed Scoreboard Pipeline Control**

| Instruction status | Wait until                                                             | Bookkeeping                                                                                                                                                                                                                                                                        |
|--------------------|------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Issue              | Not busy (FU) and not result(D)                                        | Busy(FU) $\leftarrow$ yes; $Op(FU)\leftarrow$ op;<br>$Fi(FU)\leftarrow$ `D'; $Fj(FU)\leftarrow$ `S1';<br>$Fk(FU)\leftarrow$ `S2'; $Qj\leftarrow$ Result('S1');<br>$Qk\leftarrow$ Result(`S2'); $Rj\leftarrow$ not $Qj$ ;<br>$Rk\leftarrow$ not $Qk$ ; Result('D') $\leftarrow$ FU; |
| Read operands      | Rj and Rk                                                              | Rj← No; Rk← No                                                                                                                                                                                                                                                                     |
| Execution complete | Functional unit done                                                   |                                                                                                                                                                                                                                                                                    |
| Write<br>result    | ∀f((Fj(f)≠Fi(FU)<br>or Rj(f)=No) &<br>(Fk(f)≠Fi(FU) or<br>Rk( f )=No)) | $\forall$ f(if Qj(f)=FU then Rj(f) $\leftarrow$ Yes); $\forall$ f(if Qk(f)=FU then Rj(f) $\leftarrow$ Yes); Result(Fi(FU)) $\leftarrow$ 0; Busy(FU) $\leftarrow$ No                                                                                                                |



#### Functional unit status:







#### Functional unit status:

|           |      |      |    | . – |    | _  | _  | J  |     |
|-----------|------|------|----|-----|----|----|----|----|-----|
| Time Name | Busy | Op   | Fi | Fj  | Fk | Qj | Qk | Rj | Rk  |
| Integer   | Yes  | Load | F6 |     | R2 |    |    |    | Yes |
| Mult1     | No   |      |    |     |    |    |    |    |     |
| Mult2     | No   |      |    |     |    |    |    |    |     |
| Add       | No   |      |    |     |    |    |    |    |     |
| Divide    | No   |      |    |     |    |    |    |    |     |

SI

*S*2

FU

FU

Fi?

Fk?

#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Integer

dest

#### Issue 2nd LD?



#### Functional unit status:

| i milli simins. |      |      | ucsi | $\mathcal{D}I$ | 02 | 10 |
|-----------------|------|------|------|----------------|----|----|
| Time Name       | Busy | Op   | Fi   | Fj             | Fk | Qj |
| Integer         | Yes  | Load | F6   |                | R2 |    |
| Mult1           | No   |      |      |                |    |    |
| Mult2           | No   |      |      |                |    |    |
| Add             | No   |      |      |                |    |    |
| Divide          | No   |      |      |                |    |    |

dest

51

52

FII

#### Register result status:



#### Issue MULT?

Fj?

Rj

*Fk?* 

Rk

No

FU

Qk









```
Instruction status:
                                  Read Exec Write
                           Issue Oper Comp Result
   Instruction
                  34 + R2
                                    2
                                           3
   LD
             F6
                             1
                  45+ R3
   LD
            F2
                             5
   MULTD
            F<sub>0</sub>
                  F2
                      F4
   SUBD
            F8
                  F6
                      F2
   DIVD
            F10
                  F0
                      F6
   ADDD
            F6
                  F8
                      F2
```

#### Functional unit status:





| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instruction | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    |      |        |
| MULTD       | F0    | F2   | F4 | 6     |      |      |        |
| SUBD        | F8    | F6   | F2 |       |      |      |        |
| DIVD        | F10   | F0   | F6 |       |      |      |        |
| ADDD        | F6    | F8   | F2 |       |      |      |        |

#### Functional unit status: S1 *S*2 dest FUFUFj? Fk? FiFjBusy Op FkQjQkRjRkTime Name Integer Mult1 F0 F2 Mult F4 Integer No Yes Yes Mult2 Add No Divide No

#### Register result status:

| Instruction | n sta | tus: |                  |       | Read | Exec | Write  |
|-------------|-------|------|------------------|-------|------|------|--------|
| Instructio  | n     | j    | $\boldsymbol{k}$ | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2               | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3               | 5     | 6    | 7    |        |
| MULTD       | F0    | F2   | F4               | 6     |      |      |        |
| SUBD        | F8    | F6   | F2               | 7     |      |      |        |
| DIVD        | F10   | F0   | F6               |       |      |      |        |
| ADDD        | F6    | F8   | F2               |       |      |      |        |

| Functional unit status: | •    |      | dest | SI | <i>S</i> 2 | FU      | FU      | Fj? | Fk? |  |
|-------------------------|------|------|------|----|------------|---------|---------|-----|-----|--|
| Time Name               | Busy | Op   | Fi   | Fj | Fk         | Qj      | Qk      | Rj  | Rk  |  |
| Integer                 | Yes  | Load | F2   |    | <b>R</b> 3 |         |         |     | No  |  |
| Mult1                   | Yes  | Mult | F0   | F2 | F4         | Integer |         | No  | Yes |  |
| Mult2                   | 110  |      |      |    |            |         |         |     |     |  |
| Add                     | Yes  | Sub  | F8   | F6 | F2         |         | Integer | Yes | No  |  |
| Divide                  | No   |      |      |    |            |         |         |     |     |  |

### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30 FU Mult1 Integer Add

### Read multiply operands?

## Scoreboard Example: Cycle 8a (First half of clock cycle)

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    |        |
| MULTD       | F0    | F2   | F4 | 6     |      |      |        |
| SUBD        | F8    | F6   | F2 | 7     |      |      |        |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 |       |      |      |        |

| Functional unit status. | •    |      | dest | <i>S1</i> | <i>S</i> 2 | FU | FU | Fj? | Fk? |
|-------------------------|------|------|------|-----------|------------|----|----|-----|-----|
| Time Name               | Busy | Op   | Fi   | Fj        | Fk         | Qj | Qk | Rj  | Rk  |
| Integer                 | Yes  | Load | F2   |           | R3         |    |    |     | No  |

Mult1 Mult2 Add

Divide

F0 F2 F4 Yes Mult Integer No Yes No Yes Sub F8 F6 F2 Integer Yes No Yes Div F10 F0 F6 Mult1 No Yes

#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30

8 FU Mult1 Integer Add Divide

# Scoreboard Example: Cycle 8b (Second half of clock cycle)

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     |      |      |        |
| SUBD        | F8    | F6   | F2 | 7     |      |      |        |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 |       |      |      |        |

| Functional unit status: |      |      | dest | SI | <i>S</i> 2 | FU    | FU | Fj? | Fk? |
|-------------------------|------|------|------|----|------------|-------|----|-----|-----|
| Time Name               | Busy | Op   | Fi   | Fj | Fk         | Qj    | Qk | Rj  | Rk  |
| Integer                 | No   |      |      |    |            |       |    |     |     |
| Mult1                   | Yes  | Mult | F0   | F2 | F4         |       |    | Yes | Yes |
| Mult2                   | No   |      |      |    |            |       |    |     |     |
| Add                     | Yes  | Sub  | F8   | F6 | F2         |       |    | Yes | Yes |
| Divide                  | Yes  | Div  | F10  | F0 | F6         | Mult1 |    | No  | Yes |

| Clock | F0       | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | F10 F12 | • • • | F30 |
|-------|----------|------------|-----------|-----------|-----|---------|-------|-----|
| 8     | FU Mult1 |            |           |           | Add | Divide  |       |     |

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    |      |        |
| SUBD        | F8    | F6   | F2 | 7     | 9    |      |        |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |

F6 F8 F2

| Functional unit status: |            |      |      | dest | SI | <i>S2</i> | FU    | FU | Fj? | Fk? |   |
|-------------------------|------------|------|------|------|----|-----------|-------|----|-----|-----|---|
|                         | Time Name  | Busy | Op   | Fi   | Fj | Fk        | Qj    | Qk | Rj  | Rk  |   |
|                         | Integer    | No   |      |      |    |           |       |    |     |     |   |
| Note                    | ▶ 10 Mult1 | Yes  | Mult | F0   | F2 | F4        |       |    | Yes | Yes |   |
| Remaining               | Mult2      | No   |      |      |    |           |       |    |     |     |   |
| <b>J</b>                | 2 Add      | Yes  | Sub  | F8   | F6 | F2        |       |    | Yes | Yes |   |
|                         | Divide     | Yes  | Div  | F10  | FO | F6        | Mult1 |    | No  | Ves | ĺ |

#### Register result status:

ADDD

| Clock |                     | F0    | F2 | <i>F4</i> | <i>F6</i> | F8  | F10    | <i>F12</i> | ••• | F30 |
|-------|---------------------|-------|----|-----------|-----------|-----|--------|------------|-----|-----|
| 9     | FU $ ightharpoonup$ | Mult1 |    |           |           | Add | Divide |            |     |     |

### · Read operands for MULT & SUB? Issue ADDD?

2/08/2012

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    |      |        |
| SUBD        | F8    | F6   | F2 | 7     | 9    |      |        |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 |       |      |      |        |

No

| Functional unit status: |      |      | dest | SI | <i>S2</i> | FU | FU | Fj? | Fk? |
|-------------------------|------|------|------|----|-----------|----|----|-----|-----|
| Time Name               | Busy | Op   | Fi   | Fj | Fk        | Qj | Qk | Rj  | Rk  |
| Integer                 | No   |      |      |    |           |    |    |     |     |
| 9 Mult1                 | Yes  | Mult | F0   | F2 | F4        |    |    | No  | No  |

1 Add Yes Sub F8 F6 F2 No No Divide Yes Div F10 F0 F6 Mult1 No Yes

### Register result status:

Mult2

Clock F0 F2 F4 F6 F8 F10 F12 ... F30 10 FU Mult1 Add Divide

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    |      |        |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   |        |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 |       |      |      |        |

| Functional unit status: |      |      | dest | SI | <i>S2</i> | FU | FU | Fj? | Fk? |
|-------------------------|------|------|------|----|-----------|----|----|-----|-----|
| Time Name               | Busy | Op   | Fi   | Fj | Fk        | Qj | Qk | Rj  | Rk  |
| Integer                 | No   |      |      |    |           |    |    |     |     |
| 8 Mult1                 | Yes  | Mult | F0   | F2 | F4        |    |    | No  | No  |
| Mult2                   | No   |      |      |    |           |    |    |     |     |
| 0 Add                   | Yes  | Sub  | F8   | F6 | F2        |    |    | No  | No  |

F10

F0

#### Register result status:

Divide

Yes

Div

| Clock | FO       | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | F10 F12 | ••• | F30 |
|-------|----------|------------|-----------|-----------|-----|---------|-----|-----|
| 11    | FU Mult1 |            |           |           | Add | Divide  |     |     |

Mult1

No

Yes

F6

| In | struction    | n sta | tus: |    |       | Read | Exec | Write  |
|----|--------------|-------|------|----|-------|------|------|--------|
|    | Instructio   | n     | j    | k  | Issue | Oper | Comp | Result |
|    | LD           | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
|    | LD           | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
|    | <b>MULTD</b> | F0    | F2   | F4 | 6     | 9    |      |        |
|    | SUBD         | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
|    | DIVD         | F10   | F0   | F6 | 8     |      |      |        |
|    | ADDD         | F6    | F8   | F2 |       |      |      |        |

| Functional unit status: |      |      | dest | SI | <i>S</i> 2 | FU    | FU | Fj? | Fk? |
|-------------------------|------|------|------|----|------------|-------|----|-----|-----|
| Time Name               | Busy | Op   | Fi   | Fj | Fk         | Qj    | Qk | Rj  | Rk  |
| Integer                 | No   |      |      |    |            |       |    |     |     |
| 7 Mult1                 | Yes  | Mult | F0   | F2 | F4         |       |    | No  | No  |
| Mult2                   | No   |      |      |    |            |       |    |     |     |
| Add                     | No   |      |      |    |            |       |    |     |     |
| Divide                  | Yes  | Div  | F10  | F0 | F6         | Mult1 |    | No  | Yes |

#### Register result status:

| Clock |    | F0    | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8 | F10    | <i>F12</i> | ••• | F30 |
|-------|----|-------|------------|-----------|-----------|----|--------|------------|-----|-----|
| 12    | FU | Mult1 |            |           |           |    | Divide |            |     |     |

### Read operands for DIVD?

| Instruction  | n sta | tus: |    |       | Read | Exec | Write  |
|--------------|-------|------|----|-------|------|------|--------|
| Instructio   | n     | j    | k  | Issue | Oper | Comp | Result |
| LD           | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD           | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| <b>MULTD</b> | F0    | F2   | F4 | 6     | 9    |      |        |
| SUBD         | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD         | F10   | F0   | F6 | 8     |      |      |        |
| ADDD         | F6    | F8   | F2 | 13    |      |      |        |

| Functional unit status: |      |      | dest | SI | <i>S2</i> | FU | FU | Fj? | Fk? |
|-------------------------|------|------|------|----|-----------|----|----|-----|-----|
| Time Name               | Busy | Op   | Fi   | Fj | Fk        | Qj | Qk | Rj  | Rk  |
| Integer                 | No   |      |      |    |           |    |    |     |     |
| 6 Mult1                 | Yes  | Mult | F0   | F2 | F4        |    |    | No  | No  |
| Mult2                   | No   |      |      |    |           |    |    |     |     |
| Add                     | Yes  | Add  | F6   | F8 | F2        |    |    | Yes | Yes |

F10

F0

F6

Mult1

No

Yes

#### Register result status:

Divide

Yes

Div

| Clock | F0       | <i>F</i> 2 | F4 | <i>F6</i> | F8 | F10 F12 | ••• | F30 |
|-------|----------|------------|----|-----------|----|---------|-----|-----|
| 13    | FU Mult1 |            |    | Add       |    | Divide  |     |     |

| Instructio | n sta       | tus: |    |       | Read | Exec | Write  |
|------------|-------------|------|----|-------|------|------|--------|
| Instructi  | Instruction |      |    | Issue | Oper | Comp | Result |
| LD         | F6          | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD         | F2          | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD      | FO          | F2   | F4 | 6     | 9    |      |        |
| SUBD       | F8          | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD       | F10         | F0   | F6 | 8     |      |      |        |
| ADDD       | F6          | F8   | F2 | 13    | 14   |      |        |

#### Functional unit status:

| t thirt sterring. |      |      | CCSI | ~ 1 | ~_ | 1 0   | • • | <b>-</b> J • | 1 / . |
|-------------------|------|------|------|-----|----|-------|-----|--------------|-------|
| Time Name         | Busy | Op   | Fi   | Fj  | Fk | Qj    | Qk  | Rj           | Rk    |
| Integer           | No   |      |      |     |    |       |     |              |       |
| 5 Mult1           | Yes  | Mult | F0   | F2  | F4 |       |     | No           | No    |
| Mult2             | No   |      |      |     |    |       |     |              |       |
| 2 Add             | Yes  | Add  | F6   | F8  | F2 |       |     | Yes          | Yes   |
| Divide            | Yes  | Div  | F10  | F0  | F6 | Mult1 |     | No           | Yes   |

FU FU

Fi?

#### Register result status:

| Clock | F0       | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8 | F10 F12 | ••• | F30 |
|-------|----------|------------|-----------|-----------|----|---------|-----|-----|
| 14    | FU Mult1 |            |           | Add       |    | Divide  |     |     |

dest

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    |      |        |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 | 13    | 14   |      |        |

### Functional unit status:

| me | Name    |
|----|---------|
|    | Integer |
| 4  | Mult1   |
|    | Mult2   |
| 1  | Add     |
|    | Divide  |

| •    |      | dest | SI | 52 | FU    | FU | FJ! | FK! |
|------|------|------|----|----|-------|----|-----|-----|
| Busy | Op   | Fi   | Fj | Fk | Qj    | Qk | Rj  | Rk  |
| No   |      |      |    |    |       |    |     |     |
| Yes  | Mult | F0   | F2 | F4 |       |    | No  | No  |
| No   |      |      |    |    |       |    |     |     |
| Yes  | Add  | F6   | F8 | F2 |       |    | No  | No  |
| Yes  | Div  | F10  | F0 | F6 | Mult1 |    | No  | Yes |





| Instructio | on sta | tus: |    |       | Read | Exec | Write  |
|------------|--------|------|----|-------|------|------|--------|
| Instructi  | on     | j    | k  | Issue | Oper | Comp | Result |
| LD         | F6     | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD         | F2     | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD      | FO     | F2   | F4 | 6     | 9    |      |        |
| SUBD       | F8     | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD       | F10    | F0   | F6 | 8     |      |      |        |
| ADDD       | F6     | F8   | F2 | 13    | 14   | 16   |        |

#### Functional unit status

| Time | Name    |
|------|---------|
|      | Integer |
| 3    | Mult1   |
|      | Mult2   |
| 0    | Add     |
|      | Divide  |

| 5. | •    |      | dest | SI | <i>S</i> 2 | FU    | FU | Fj? | Fk? |  |
|----|------|------|------|----|------------|-------|----|-----|-----|--|
|    | Busy | Op   | Fi   | Fj | Fk         | Qj    | Qk | Rj  | Rk  |  |
|    | No   |      |      |    |            |       |    |     |     |  |
|    | Yes  | Mult | F0   | F2 | F4         |       |    | No  | No  |  |
|    | No   |      |      |    |            |       |    |     |     |  |
|    | Yes  | Add  | F6   | F8 | F2         |       |    | No  | No  |  |
|    | Yes  | Div  | F10  | F0 | F6         | Mult1 |    | No  | Yes |  |



| Instruction | n sta       | tus: |    | Read  | Exec | Write |        |
|-------------|-------------|------|----|-------|------|-------|--------|
| Instruction | Instruction |      |    | Issue | Oper | Comp  | Result |
| LD          | F6          | 34+  | R2 | 1     | 2    | 3     | 4      |
| LD          | F2          | 45+  | R3 | 5     | 6    | 7     | 8      |
| MULTD       | F0          | F2   | F4 | 6     | 9    |       |        |
| SUBD        | F8          | F6   | F2 | 7     | 9    | 11    | 12     |
| DIVD        | F10         | F0   | F6 | 8     |      |       |        |
| ADDD        | F6          | F8   | F2 | 13    | 14   | 16    |        |

### WAR Hazard!

INO

#### Functional unit status: SI *S*2 FUFUFj? Fk? dest Fi $F_i$ FkQk $R_i$ RkBusy Time Name OpNo Integer 2 Mult1 Yes Mult F0 F4 No No Mult2 No Add Add F8 Yes F6 No

F10

F<sub>0</sub>

F6

Multi

Div

### Register result status:

Why not write result of ADD???

Yes

Divide

| Instructio | on sta | tus: |    |       | Read | Exec | Write  |
|------------|--------|------|----|-------|------|------|--------|
| Instructi  | on     | j    | k  | Issue | Oper | Comp | Result |
| LD         | F6     | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD         | F2     | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD      | FO     | F2   | F4 | 6     | 9    |      |        |
| SUBD       | F8     | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD       | F10    | F0   | F6 | 8     |      |      |        |
| ADDD       | F6     | F8   | F2 | 13    | 14   | 16   |        |

#### Functional unit status:

| i mill siding. |      |      | ucsi | $\mathcal{D}I$ | 02 | 10    | 10 | IJ. | I K. |
|----------------|------|------|------|----------------|----|-------|----|-----|------|
| Time Name      | Busy | Op   | Fi   | Fj             | Fk | Qj    | Qk | Rj  | Rk   |
| Integer        | No   |      |      |                |    |       |    |     |      |
|                | Yes  | Mult | F0   | F2             | F4 |       |    | No  | No   |
| Mult2          | No   |      |      |                |    |       |    |     |      |
| Add            | Yes  | Add  | F6   | F8             | F2 |       |    | No  | No   |
| Divide         | Yes  | Div  | F10  | F0             | F6 | Mult1 |    | No  | Yes  |

S1

FII

Fi?

Fk?

#### Register result status:

| Clock | F0       | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8 | F10 F12 | • • • | F30 |
|-------|----------|------------|-----------|-----------|----|---------|-------|-----|
| 18    | FU Mult1 |            |           | Add       |    | Divide  |       |     |

dest

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    | 19   |        |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 | 13    | 14   | 16   |        |

#### Functional unit status:

| Time | Name    |
|------|---------|
|      | Integer |
| 0    | Mult1   |
|      | Mult2   |
|      | Add     |
|      | Divide  |

| <b>.</b> |      |      | aest |    |    | FU    | FU | FJ? | FK! |
|----------|------|------|------|----|----|-------|----|-----|-----|
|          | Busy | Op   | Fi   | Fj | Fk | Qj    | Qk | Rj  | Rk  |
|          | No   |      |      |    |    |       |    |     |     |
|          | Yes  | Mult | F0   | F2 | F4 |       |    | No  | No  |
|          | No   |      |      |    |    |       |    |     |     |
|          | Yes  | Add  | F6   | F8 | F2 |       |    | No  | No  |
|          | Yes  | Div  | F10  | F0 | F6 | Mult1 |    | No  | Yes |

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    | 19   | 20     |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 | 13    | 14   | 16   |        |

#### Functional unit status:

|           |      |     |     | ~ - | ~ — |    |    | - <i>J</i> · |     |
|-----------|------|-----|-----|-----|-----|----|----|--------------|-----|
| Time Name | Busy | Op  | Fi  | Fj  | Fk  | Qj | Qk | Rj           | Rk  |
| Integer   | No   |     |     |     |     |    |    |              |     |
| Mult1     | No   |     |     |     |     |    |    |              |     |
| Mult2     | No   |     |     |     |     |    |    |              |     |
| Add       | Yes  | Add | F6  | F8  | F2  |    |    | No           | No  |
| Divide    | Yes  | Div | F10 | F0  | F6  |    |    | Yes          | Yes |

FU FU Fi?

Fk?

#### Register result status:

| Clock | F0 | F2 | <i>F4</i> | <i>F6</i> | F8 | F10 F12 | ••• | F30 |
|-------|----|----|-----------|-----------|----|---------|-----|-----|
| 20    | FU |    |           | Add       |    | Divide  |     |     |

dest

| Instruction | n sta | tus: |            |       | Read | Exec | Write  |
|-------------|-------|------|------------|-------|------|------|--------|
| Instructio  | n     | j    | k          | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2         | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | <b>R</b> 3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4         | 6     | 9    | 19   | 20     |
| SUBD        | F8    | F6   | F2         | 7     | 9    | 11   | 12     |
| DIVD        | F10   | F0   | F6         | 8     | 21   |      |        |
| ADDD        | F6    | F8   | F2         | 13    | 14   | 16   |        |

| Functional unit status: |      |     | dest | SI | <i>S2</i> | FU | FU | Fj? | Fk? |
|-------------------------|------|-----|------|----|-----------|----|----|-----|-----|
| Time Name               | Busy | Op  | Fi   | Fj | Fk        | Qj | Qk | Rj  | Rk  |
| Integer                 | No   |     |      |    |           |    |    |     |     |
| Mult1                   | No   |     |      |    |           |    |    |     |     |
| Mult2                   | No   |     |      |    |           |    |    |     |     |
| Add                     | Yes  | Add | F6   | F8 | F2        |    |    | No  | No  |
| Divide                  | Yes  | Div | F10  | F0 | F6        |    |    | Yes | Yes |

#### Register result status:

| Clock |    | F0 | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8 | F10 F1 | <i>!</i> 2 | F30 |
|-------|----|----|------------|-----------|-----------|----|--------|------------|-----|
| 21    | FU |    |            |           | Add       |    | Divide |            |     |

### · WAR Hazard is now gone...

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instruction |       | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    | 19   | 20     |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD        | F10   | F0   | F6 | 8     | 21   |      |        |
| ADDD        | F6    | F8   | F2 | 13    | 14   | 16   | 22     |
|             |       |      |    |       |      |      |        |

| Functional unit status. | •    |    | dest | <i>S1</i> | <i>S</i> 2 | FU | FU | Fj? | Fk? |
|-------------------------|------|----|------|-----------|------------|----|----|-----|-----|
| Time Name               | Busy | Op | Fi   | Fj        | Fk         | Qj | Qk | Rj  | Rk  |
| Integer                 | No   |    |      |           |            |    |    |     |     |
| Mult1                   | No   |    |      |           |            |    |    |     |     |

Mult2 No Add No

39 Divide Yes Div F10 F0 F6 No

### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30

22 FU Divide

No

# Faster than light computation (skip a couple of cycles)

| Instructio  |          | Read   | Exec  | Write |         |      |        |            |         |
|-------------|----------|--------|-------|-------|---------|------|--------|------------|---------|
| Instruction | n        | j      | k     | Issue | Oper    | Comp | Result |            |         |
| LD          | F6       | 34+    | R2    | 1     | 2       | 3    | 4      |            |         |
| LD          | F2       | 45+    | R3    | 5     | 6       | 7    | 8      |            |         |
| MULTD       | F0       | F2     | F4    | 6     | 9       | 19   | 20     |            |         |
| SUBD        | F8       | F6     | F2    | 7     | 9       | 11   | 12     |            |         |
| DIVD        | F10      | F0     | F6    | 8     | 21      | 61   |        |            |         |
| ADDD        | F6       | F8     | F2    | 13    | 14      | 16   | 22     |            |         |
|             | _        |        |       |       |         |      |        |            |         |
| Functiona   | il uni   | it sto | atus. | •     |         | dest | S1     | <i>S</i> 2 | FU      |
|             | <i>-</i> | 3.7    |       | D     | $\circ$ | 77.  | 77.    | 771        | $\circ$ |

| Functional unit status: |      |     | dest | S1 | <i>S</i> 2 | FU | FU | Fj? | Fk? |
|-------------------------|------|-----|------|----|------------|----|----|-----|-----|
| Time Name               | Busy | Op  | Fi   | Fj | Fk         | Qj | Qk | Rj  | Rk  |
| Integer                 | No   |     |      |    |            |    |    |     |     |
| Mult1                   | No   |     |      |    |            |    |    |     |     |
| Mult2                   | No   |     |      |    |            |    |    |     |     |
| Add                     | No   |     |      |    |            |    |    |     |     |
| 0 Divide                | Yes  | Div | F10  | F0 | F6         |    |    | No  | No  |



### **Scoreboard Example: Cycle 62**

| Instruction | ı sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instruction | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    | 19   | 20     |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD        | F10   | F0   | F6 | 8     | 21   | 61   | 62     |
| ADDD        | F6    | F8   | F2 | 13    | 14   | 16   | 22     |

#### Functional unit status: S1 *S*2 dest FUFUFj? Fk? FiFjRkBusy OpFkQjQkRjTime Name No Integer

Mult1 No
Mult2 No
Add No
Divide No

#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30
62 FU

### Review: Scoreboard Example: Cycle 62



#### Functional unit status:

| Time Name        | Busy | Op | Fi | Fj | Fk | Qj | Qk | Rj | Rk |
|------------------|------|----|----|----|----|----|----|----|----|
| Integer          | No   |    |    |    |    |    |    |    |    |
| Integer<br>Mult1 | No   |    |    |    |    |    |    |    |    |
| Mult2            | No   |    |    |    |    |    |    |    |    |
| Add              | No   |    |    |    |    |    |    |    |    |
| Divide           | No   |    |    |    |    |    |    |    |    |

SI

*S*2

FU

FU

Fi?

Fk?

#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30
62 FU

dest

· In-order issue; out-of-order execute & commit

#### CDC 6600 Scoreboard

- Speedup 1.7 from compiler; 2.5 by hand BUT slow memory (no cache) limits benefit
- Limitations of 6600 scoreboard:
  - No forwarding hardware
  - Limited to instructions in basic block (small window)
  - Small number of functional units (structural hazards), especially integer/load store units
  - Do not issue on structural hazards
  - Wait for WAR hazards
  - Prevent WAW hazards

## **Another Dynamic Algorithm: Tomasulo Algorithm**

- For IBM 360/91 about 3 years after CDC 6600 (1966)
- Goal: High Performance without special compilers
- Differences between IBM 360 & CDC 6600 ISA
  - IBM has only 2 register specifiers/instr vs. 3 in CDC 6600
  - IBM has 4 FP registers vs. 8 in CDC 6600
  - IBM has memory-register ops
- Why Study? lead to Alpha 21264, HP 8000, MIPS 10000, Pentium II, PowerPC 604, ...

### **Tomasulo Organization**



Common Data Bus (CDB)

41

### Tomasulo Algorithm vs. Scoreboard

- Control & buffers <u>distributed</u> with Function Units (FU) vs. centralized in scoreboard;
  - FU buffers called "reservation stations"; have pending operands
- Registers in instructions replaced by values or pointers to reservation stations(RS); called <u>register renaming</u>;
  - avoids WAR, WAW hazards
  - More reservation stations than registers, so can do optimizations compilers can't
- Results to FU from RS, <u>not through registers</u>, over <u>Common Data Bus</u> that broadcasts results to all FUs
- Load and Stores treated as FUs with RSs as well
- Integer instructions can go past branches, allowing FP ops beyond basic block in FP queue

#### **Reservation Station Components**

Op: Operation to perform in the unit (e.g., + or –)

Vj, Vk: Value of Source operands

Store buffers has V field, result to be stored

Qj, Qk: Reservation stations producing source registers (value to be written)

- Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready
- Store buffers only have Qi for RS producing result

**Busy:** Indicates reservation station or FU is busy

Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.

### **Three Stages of Tomasulo Algorithm**

#### 1. Issue—get instruction from FP Op Queue

If reservation station free (no structural hazard), control issues instr & sends operands (renames registers).

#### 2. Execution—operate on operands (EX)

When both operands ready then execute; if not ready, watch Common Data Bus for result

#### 3. Write result—finish execution (WB)

Write on Common Data Bus to all awaiting units; mark reservation station available

- Normal data bus: data + destination ("go to" bus)
- Common data bus: data + source ("come from" bus)
  - 64 bits of data + 4 bits of Functional Unit <u>source</u> address
  - Write if matches expected Functional Unit (produces result)
  - Does the broadcast

#### **Tomasulo Example**

```
Instruction status:
                                   Exec
                                         Write
   Instruction
                        k
                            Issue Comp Result
                                                             Busy
                                                                   Address
                                                               No
   LD
            F6
                  34 +
                        R2
                                                      Load1
                 45 +
                                                      Load2
   LD
            F2
                        R3
                                                              No
                  F2
   MULTD
            FO
                        F4
                                                      Load3
                                                              No
   SUBD
            F8
                  F6
                        F2
   DIVD
            F10
                  F0
                        F6
                        F2
   ADDD
            F6
                  F8
Reservation Stations:
                                     SI
                                           S2
                                                 RS
                                                        RS
                                     Vj
                                           Vk
           Time Name Busy
                             Op
                                                  Q_j
                                                        Qk
                Add1
                        No
                Add2
                        No
                Add3
                        No
                Mult1
                        No
                Mult2
                       No
```





#### Register result status:

Mult2

No





#### Register result status:



Note: Unlike 6600, can have multiple loads outstanding



#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30

Mult1 Load2 Load1

 Note: registers names are removed ("renamed") in Reservation Stations; MULT issued vs. scoreboard 2/08/20Load1 completing; what is waiting for Load1?

| Insti        | ructio                | n sta | tus:   |                  |       | Exec  | Write     |    |       |      |         |
|--------------|-----------------------|-------|--------|------------------|-------|-------|-----------|----|-------|------|---------|
| Ir           | nstructio             | n     | j      | $\boldsymbol{k}$ | Issue | Comp  | Result    |    |       | Busy | Address |
| L            | D                     | F6    | 34+    | R2               | 1     | 3     | 4         |    | Load1 | No   |         |
| L            | $^{\prime}\mathbf{D}$ | F2    | 45+    | R3               | 2     | 4     |           |    | Load2 | Yes  | 45+R3   |
| $\mathbf{N}$ | <b>IULTD</b>          | FO    | F2     | F4               | 3     |       |           |    | Load3 | No   |         |
| S            | UBD                   | F8    | F6     | F2               | 4     |       |           |    |       |      |         |
| D            | OIVD                  | F10   | FO     | <b>F6</b>        |       |       |           |    |       |      |         |
| A            | DDD                   | F6    | F8     | F2               |       |       |           |    |       |      |         |
| Rese         | ervatio               | on St | ations | 5.               |       | S1    | <i>S2</i> | RS | RS    |      |         |
|              |                       | Time  | Name   | Busy             | Op    | Vj    | Vk        | Qj | Qk    | _    |         |
|              |                       |       | Add1   | Yes              | SUBD  | M(A1) |           |    | Load2 |      |         |
|              |                       |       | Add2   | No               |       |       |           |    |       |      |         |

#### Register result status:

Add3

Mult1

Mult2

No

No

Yes MULTD

Clock F0 F2 F4 F6 F8 F10 F12 ... F30

4 FU Mult1 Load2 M(A1) Add1

R(F4) Load2

Load2 completing; what is waiting for Load2?

| Instructio  | n sta | tus:      |                  |       | Exec  | Write      |    |       |      |         |
|-------------|-------|-----------|------------------|-------|-------|------------|----|-------|------|---------|
| Instruction | on    | $\dot{j}$ | $\boldsymbol{k}$ | Issue | Comp  | Result     |    |       | Busy | Address |
| LD          | F6    | 34+       | R2               | 1     | 3     | 4          |    | Load1 | No   |         |
| LD          | F2    | 45+       | R3               | 2     | 4     | 5          |    | Load2 | No   |         |
| MULTD       | F0    | F2        | F4               | 3     |       |            |    | Load3 | No   |         |
| SUBD        | F8    | F6        | F2               | 4     |       |            |    |       |      |         |
| DIVD        | F10   | FO        | F6               | 5     |       |            |    |       |      |         |
| ADDD        | F6    | F8        | F2               |       |       |            |    |       |      |         |
| Reservati   | on St | ations    | s:               |       | S1    | <i>S</i> 2 | RS | RS    |      |         |
|             | Time  | Name      | Busy             | Op    | Vj    | Vk         | Qj | Qk    | _    |         |
|             | 2     | Add1      | Yes              | SUBD  | M(A1) | M(A2)      |    |       |      |         |
|             |       | Add2      | No               |       |       |            |    |       |      |         |

Yes MULTD M(A2) R(F4)

**DIVD** 

#### Register result status:

Add3

Mult2

10 Mult1

No

Yes

Clock F0F2*F4 F6* F8 F10 *F12* F30 5 FUMult1 M(A2)M(A1)Add1 Mult2

M(A1) Mult1

| In | structio    | n sta | tus:   |      |       | Exec  | Write      |      |       |      |         |
|----|-------------|-------|--------|------|-------|-------|------------|------|-------|------|---------|
|    | Instruction | n     | j      | k    | Issue | Comp  | Result     |      |       | Busy | Address |
|    | LD          | F6    | 34+    | R2   | 1     | 3     | 4          |      | Load1 | No   |         |
|    | LD          | F2    | 45+    | R3   | 2     | 4     | 5          |      | Load2 | No   |         |
|    | MULTD       | FO    | F2     | F4   | 3     |       |            |      | Load3 | No   |         |
|    | SUBD        | F8    | F6     | F2   | 4     |       |            |      |       |      | _       |
|    | DIVD        | F10   | FO     | F6   | 5     |       |            |      |       |      |         |
|    | ADDD        | F6    | F8     | F2   | 6     |       |            |      |       |      |         |
| Re | eservatio   | on St | ations | s:   |       | S1    | <i>S</i> 2 | RS   | RS    |      |         |
|    |             | Time  | Name   | Busy | Op    | Vj    | Vk         | Qj   | Qk    |      |         |
|    |             | 1     | Add1   | Yes  | SUBD  | M(A1) | M(A2)      |      |       |      |         |
|    |             |       | Add2   | Yes  | ADDD  |       | M(A2)      | Add1 |       |      |         |
|    |             |       | Add3   | No   |       |       |            |      |       |      |         |

Yes MULTD M(A2) R(F4)

**DIVD** 

#### Register result status:

9 Mult1

Mult2

Yes

Clock *F4 F6* F0*F*2 F8 F10F12 *F30* Mult1 M(A2)Add2 6 FUAdd1 Mult2

M(A1) Mult1

Issue ADDD here vs. scoreboard?

| In | struction  | n sta | tus:      |                  |       | Exec  | Write      |      |       |      |         |
|----|------------|-------|-----------|------------------|-------|-------|------------|------|-------|------|---------|
|    | Instructio | n     | $\dot{J}$ | $\boldsymbol{k}$ | Issue | Comp  | Result     |      |       | Busy | Address |
|    | LD         | F6    | 34+       | R2               | 1     | 3     | 4          |      | Load1 | No   |         |
|    | LD         | F2    | 45+       | <b>R</b> 3       | 2     | 4     | 5          |      | Load2 | No   |         |
|    | MULTD      | F0    | F2        | F4               | 3     |       |            |      | Load3 | No   |         |
|    | SUBD       | F8    | F6        | F2               | 4     | 7     |            |      |       |      |         |
|    | DIVD       | F10   | FO        | F6               | 5     |       |            |      |       |      |         |
|    | ADDD       | F6    | F8        | F2               | 6     |       |            |      |       |      |         |
| Re | eservatio  | on St | ations    | 5.               |       | S1    | <i>S</i> 2 | RS   | RS    |      |         |
|    |            | Time  | Name      | Busy             | Op    | Vj    | Vk         | Qj   | Qk    | _    |         |
|    |            | 0     | Add1      | Yes              | SUBD  | M(A1) | M(A2)      |      |       |      |         |
|    |            |       | Add2      | Yes              | ADDD  |       | M(A2)      | Add1 |       |      |         |
|    |            |       | Add3      | No               |       |       |            |      |       |      |         |

Yes MULTD M(A2) R(F4)

**DIVD** 

Yes

#### Register result status:

8 Mult1 Mult2

Clock F0*F*2 F4*F6* F8 F10 F12 *F30* Mult1 M(A2)Add2 7 FUAdd1 Mult2

M(A1) Mult1

Add1 completing; what is waiting for it?

| Instructio  | n sta | tus:   |                  |       | Exec  | Write      |        |       |      |         |
|-------------|-------|--------|------------------|-------|-------|------------|--------|-------|------|---------|
| Instruction | on    | j      | $\boldsymbol{k}$ | Issue | Comp  | Result     |        |       | Busy | Address |
| LD          | F6    | 34+    | R2               | 1     | 3     | 4          |        | Load1 | No   |         |
| LD          | F2    | 45+    | <b>R</b> 3       | 2     | 4     | 5          |        | Load2 | No   |         |
| MULTD       | F0    | F2     | F4               | 3     |       |            |        | Load3 | No   |         |
| SUBD        | F8    | F6     | F2               | 4     | 7     | 8          |        |       |      |         |
| DIVD        | F10   | FO     | F6               | 5     |       |            |        |       |      |         |
| ADDD        | F6    | F8     | F2               | 6     |       |            |        |       |      |         |
| Reservation | on St | ations | 5. <b>:</b>      |       | S1    | <i>S</i> 2 | RS     | RS    |      |         |
|             | Time  | Name   | Busy             | Op    | Vj    | Vk         | Qj     | Qk    |      |         |
|             |       | Add1   | No               |       |       |            |        |       |      |         |
|             | 2     | Add2   | Yes              | ADDD  | (M-M) | M(A2)      |        |       |      |         |
|             |       | Add3   | No               |       |       |            |        |       |      |         |
|             | 7     | Mult1  | Yes              | MULTI | M(A2) | R(F4)      |        |       |      |         |
|             |       | Mult2  | Yes              | DIVD  |       | M(A1)      | Mult 1 |       |      |         |

#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30

8 FU Mult1 M(A2) Add2 (M-M) Mult2

| In | structio    | n sta | tus:      |            |       | Exec  | Write      |       |       |      |         |
|----|-------------|-------|-----------|------------|-------|-------|------------|-------|-------|------|---------|
|    | Instruction | on    | $\dot{j}$ | k          | Issue | Comp  | Result     |       |       | Busy | Address |
|    | LD          | F6    | 34+       | R2         | 1     | 3     | 4          |       | Load1 | No   |         |
|    | LD          | F2    | 45+       | R3         | 2     | 4     | 5          |       | Load2 | No   |         |
|    | MULTD       | FO    | F2        | <b>F</b> 4 | 3     |       |            |       | Load3 | No   |         |
|    | SUBD        | F8    | F6        | F2         | 4     | 7     | 8          |       |       |      |         |
|    | DIVD        | F10   | FO        | <b>F6</b>  | 5     |       |            |       |       |      |         |
|    | ADDD        | F6    | F8        | F2         | 6     |       |            |       |       |      |         |
| Re | eservatio   | on St | ations    | 7.         |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|    |             | Time  | Name      | Busy       | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|    |             |       | Add1      | No         |       |       |            |       |       |      |         |
|    |             | 1     | Add2      | Yes        | ADDD  | (M-M) | M(A2)      |       |       |      |         |
|    |             |       | Add3      | No         |       |       |            |       |       |      |         |
|    |             | 6     | Mult1     | Yes        | MULTE | M(A2) | R(F4)      |       |       |      |         |
|    |             |       | Mult2     | Yes        | DIVD  |       | M(A1)      | Mult1 |       |      |         |

| Clock |    | FO    | F2    | <i>F4</i> | <i>F6</i> | F8    | F10   | <i>F12</i> | ••• | F30 |
|-------|----|-------|-------|-----------|-----------|-------|-------|------------|-----|-----|
| 9     | FU | Mult1 | M(A2) |           | Add2      | (M-M) | Mult2 |            |     |     |

| Instructio  | n sta | tus:   |                  |       | Exec  | Write      |       |       |      |         |
|-------------|-------|--------|------------------|-------|-------|------------|-------|-------|------|---------|
| Instruction | on    | j      | $\boldsymbol{k}$ | Issue | Comp  | Result     |       |       | Busy | Address |
| LD          | F6    | 34+    | R2               | 1     | 3     | 4          |       | Load1 | No   |         |
| LD          | F2    | 45+    | <b>R</b> 3       | 2     | 4     | 5          |       | Load2 | No   |         |
| MULTD       | FO    | F2     | F4               | 3     |       |            |       | Load3 | No   |         |
| SUBD        | F8    | F6     | F2               | 4     | 7     | 8          |       |       |      |         |
| DIVD        | F10   | FO     | F6               | 5     |       |            |       |       |      |         |
| ADDD        | F6    | F8     | F2               | 6     | 10    |            |       |       |      |         |
| Reservati   | on St | ations | s:               |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|             | Time  | Name   | Busy             | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|             |       | Add1   | No               |       |       |            |       |       |      |         |
|             | 0     | Add2   | Yes              | ADDD  | (M-M) | M(A2)      |       |       |      |         |
|             |       | Add3   | No               |       |       |            |       |       |      |         |
|             | 5     | Mult1  | Yes              | MULTE | M(A2) | R(F4)      |       |       |      |         |
|             |       | Mult2  | Yes              | DIVD  |       | M(A1)      | Mult1 |       |      |         |

#### Register result status:

Add2 completing; what is waiting for it?

| Instruc | tio   | n sta | tus:   |                  |       | Exec  | Write      |       |       |      |         |
|---------|-------|-------|--------|------------------|-------|-------|------------|-------|-------|------|---------|
| Instru  | ictic | on    | j      | $\boldsymbol{k}$ | Issue | Comp  | Result     |       |       | Busy | Address |
| LD      |       | F6    | 34+    | R2               | 1     | 3     | 4          |       | Load1 | No   |         |
| LD      |       | F2    | 45+    | <b>R</b> 3       | 2     | 4     | 5          |       | Load2 | No   |         |
| MUL     | TD    | FO    | F2     | F4               | 3     |       |            |       | Load3 | No   |         |
| SUBI    | )     | F8    | F6     | F2               | 4     | 7     | 8          |       |       |      |         |
| DIVE    | )     | F10   | FO     | F6               | 5     |       |            |       |       |      |         |
| ADD     | D     | F6    | F8     | F2               | 6     | 10    | 11         |       |       |      |         |
| Reserve | atio  | on St | ations | s:               |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|         |       | Time  | Name   | Busy             | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|         |       |       | Add1   | No               |       |       |            |       |       |      |         |
|         |       |       | Add2   | No               |       |       |            |       |       |      |         |
|         |       |       | Add3   | No               |       |       |            |       |       |      |         |
|         |       | 4     | Mult1  | Yes              | MULTI | M(A2) | R(F4)      |       |       |      |         |
|         |       |       | Mult2  | Yes              | DIVD  |       | M(A1)      | Mult1 |       |      |         |

#### Register result status:

- · Write result of ADDD here vs. scoreboard?
- All quick instructions complete in this cycle! 2/08/2012 cs252-S12, Lecture07

**56** 

| Instruc | tio   | n sta | tus:   |                  |       | Exec  | Write      |       |       |      |         |
|---------|-------|-------|--------|------------------|-------|-------|------------|-------|-------|------|---------|
| Instru  | actic | n     | j      | $\boldsymbol{k}$ | Issue | Comp  | Result     |       |       | Busy | Address |
| LD      |       | F6    | 34+    | R2               | 1     | 3     | 4          |       | Load1 | No   |         |
| LD      |       | F2    | 45+    | <b>R</b> 3       | 2     | 4     | 5          |       | Load2 | No   |         |
| MUL     | TD    | FO    | F2     | F4               | 3     |       |            |       | Load3 | No   |         |
| SUBI    | D     | F8    | F6     | F2               | 4     | 7     | 8          |       |       |      |         |
| DIVI    | )     | F10   | FO     | F6               | 5     |       |            |       |       |      |         |
| ADD     | D     | F6    | F8     | F2               | 6     | 10    | 11         |       |       |      |         |
| Reserv  | atio  | on St | ations | s:               |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|         |       | Time  | Name   | Busy             | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|         |       |       | Add1   | No               |       |       |            |       |       |      |         |
|         |       |       | Add2   | No               |       |       |            |       |       |      |         |
|         |       |       | Add3   | No               |       |       |            |       |       |      |         |
|         |       | 3     | Mult1  | Yes              | MULTI | M(A2) | R(F4)      |       |       |      |         |
|         |       |       | Mult2  | Yes              | DIVD  |       | M(A1)      | Mult1 |       |      |         |



| Instructio  | n sta | tus:   |                  |       | Exec  | Write      |       |       |      |         |
|-------------|-------|--------|------------------|-------|-------|------------|-------|-------|------|---------|
| Instruction | on    | j      | $\boldsymbol{k}$ | Issue | Comp  | Result     |       |       | Busy | Address |
| LD          | F6    | 34+    | R2               | 1     | 3     | 4          |       | Load1 | No   |         |
| LD          | F2    | 45+    | <b>R</b> 3       | 2     | 4     | 5          |       | Load2 | No   |         |
| MULTD       | FO    | F2     | F4               | 3     |       |            |       | Load3 | No   |         |
| SUBD        | F8    | F6     | F2               | 4     | 7     | 8          |       |       |      |         |
| DIVD        | F10   | FO     | F6               | 5     |       |            |       |       |      |         |
| ADDD        | F6    | F8     | F2               | 6     | 10    | 11         |       |       |      |         |
| Reservation | on St | ations | <b>5</b> :       |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|             | Time  | Name   | Busy             | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|             |       | Add1   | No               |       |       |            |       |       |      |         |
|             |       | Add2   | No               |       |       |            |       |       |      |         |
|             |       | Add3   | No               |       |       |            |       |       |      |         |
|             | 2     | Mult1  | Yes              | MULTI | M(A2) | R(F4)      |       |       |      |         |
|             |       | Mult2  | Yes              | DIVD  |       | M(A1)      | Mult1 |       |      |         |



| Instructio | n sta | tus:   |                  |       | Exec  | Write      |       |       |      |         |
|------------|-------|--------|------------------|-------|-------|------------|-------|-------|------|---------|
| Instructi  | on    | j      | $\boldsymbol{k}$ | Issue | Comp  | Result     |       |       | Busy | Address |
| LD         | F6    | 34+    | R2               | 1     | 3     | 4          |       | Load1 | No   |         |
| LD         | F2    | 45+    | <b>R</b> 3       | 2     | 4     | 5          |       | Load2 | No   |         |
| MULTD      | F0    | F2     | F4               | 3     |       |            |       | Load3 | No   |         |
| SUBD       | F8    | F6     | F2               | 4     | 7     | 8          |       |       |      |         |
| DIVD       | F10   | FO     | F6               | 5     |       |            |       |       |      |         |
| ADDD       | F6    | F8     | F2               | 6     | 10    | 11         |       |       |      |         |
| Reservati  | on St | ations | s:               |       | S1    | <i>S</i> 2 | RS    | RS    |      |         |
|            | Time  | Name   | Busy             | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|            |       | Add1   | No               |       |       |            |       |       |      |         |
|            |       | Add2   | No               |       |       |            |       |       |      |         |
|            |       | Add3   | No               |       |       |            |       |       |      |         |
|            | 1     | Mult1  | Yes              | MULTI | M(A2) | R(F4)      |       |       |      |         |
|            |       | Mult2  | Yes              | DIVD  |       | M(A1)      | Mult1 |       | ]    |         |



| Instructio  | n sta | tus:   |                  |       | Exec  | Write         |       |       |      |         |
|-------------|-------|--------|------------------|-------|-------|---------------|-------|-------|------|---------|
| Instruction | on    | j      | $\boldsymbol{k}$ | Issue | Comp  | Result        |       |       | Busy | Address |
| LD          | F6    | 34+    | R2               | 1     | 3     | 4             |       | Load1 | No   |         |
| LD          | F2    | 45+    | R3               | 2     | 4     | 5             |       | Load2 | No   |         |
| MULTD       | FO    | F2     | F4               | 3     | 15    |               |       | Load3 | No   |         |
| SUBD        | F8    | F6     | F2               | 4     | 7     | 8             |       |       |      |         |
| DIVD        | F10   | FO     | F6               | 5     |       |               |       |       |      |         |
| ADDD        | F6    | F8     | F2               | 6     | 10    | 11            |       |       |      |         |
| Reservation | on St | ations | s:               |       | S1    | <i>S</i> 2    | RS    | RS    |      |         |
|             | Time  | Name   | Busy             | Op    | Vj    | Vk            | Qj    | Qk    |      |         |
|             |       | Add1   | No               |       |       |               |       |       |      |         |
|             |       | Add2   | No               |       |       |               |       |       |      |         |
|             |       | Add3   | No               |       |       |               |       |       |      |         |
|             | 0     | Mult1  | Yes              | MULTI | M(A2) | <b>R</b> (F4) |       |       |      |         |
|             |       | Mult2  | Yes              | DIVD  |       | M(A1)         | Mult1 |       | ]    |         |



| Instr | ructio   | n sta | tus:   |                  |       | Exec | Write      |    |       |      |         |
|-------|----------|-------|--------|------------------|-------|------|------------|----|-------|------|---------|
| In    | structio | on    | j      | $\boldsymbol{k}$ | Issue | Comp | Result     |    |       | Busy | Address |
| L     | D        | F6    | 34+    | R2               | 1     | 3    | 4          |    | Load1 | No   |         |
| L     | D        | F2    | 45+    | R3               | 2     | 4    | 5          |    | Load2 | No   |         |
| M     | IULTD    | FO    | F2     | F4               | 3     | 15   | 16         |    | Load3 | No   |         |
| SU    | UBD      | F8    | F6     | F2               | 4     | 7    | 8          |    |       |      |         |
| D     | IVD      | F10   | FO     | F6               | 5     |      |            |    |       |      |         |
| A     | DDD      | F6    | F8     | F2               | 6     | 10   | 11         |    |       |      |         |
| Rese  | ervatio  | on St | ations | 5. <b>:</b>      |       | S1   | <i>S</i> 2 | RS | RS    |      |         |
|       |          | Time  | Name   | Busy             | Op    | Vj   | Vk         | Qj | Qk    |      |         |
|       |          |       | Add1   | No               |       |      |            |    |       |      |         |
|       |          |       | Add2   | No               |       |      |            |    |       |      |         |
|       |          |       | Add3   | No               |       |      |            |    |       |      |         |
|       |          |       | Mult1  | No               |       |      |            |    |       |      |         |
|       |          | 40    | Mult2  | Yes              | DIVD  | M*F4 | M(A1)      |    |       |      |         |

#### Register result status:

# Faster than light computation (skip a couple of cycles)

| Instructio  | n sta | tus:   |      |       | Exec | Write      |    |       |      |         |
|-------------|-------|--------|------|-------|------|------------|----|-------|------|---------|
| Instruction | on    | j      | k    | Issue | Comp | Result     |    |       | Busy | Address |
| LD          | F6    | 34+    | R2   | 1     | 3    | 4          |    | Load1 | No   |         |
| LD          | F2    | 45+    | R3   | 2     | 4    | 5          |    | Load2 | No   |         |
| MULTD       | F0    | F2     | F4   | 3     | 15   | 16         |    | Load3 | No   |         |
| SUBD        | F8    | F6     | F2   | 4     | 7    | 8          |    |       |      |         |
| DIVD        | F10   | FO     | F6   | 5     |      |            |    |       |      |         |
| ADDD        | F6    | F8     | F2   | 6     | 10   | 11         |    |       |      |         |
| Reservati   | on St | ations | s.:  |       | S1   | <i>S</i> 2 | RS | RS    |      |         |
|             | Time  | Name   | Busy | Op    | Vj   | Vk         | Qj | Qk    |      |         |
|             |       | Add1   | No   |       |      |            |    |       |      |         |
|             |       | Add2   | No   |       |      |            |    |       |      |         |
|             |       | Add3   | No   |       |      |            |    |       |      |         |
|             |       | Mult1  | No   |       |      |            |    |       |      |         |
|             | 1     | Mult2  | Yes  | DIVD  | M*F4 | M(A1)      |    |       |      |         |

#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30
55 FU M\*F4 M(A2) (M-M+N(M-M) Mult2

| Instructio  | n sta | tus:   |      |       | Exec | Write      |    |       |      |         |
|-------------|-------|--------|------|-------|------|------------|----|-------|------|---------|
| Instruction | on    | j      | k    | Issue | Comp | Result     |    |       | Busy | Address |
| LD          | F6    | 34+    | R2   | 1     | 3    | 4          |    | Load1 | No   |         |
| LD          | F2    | 45+    | R3   | 2     | 4    | 5          |    | Load2 | No   |         |
| MULTD       | FO    | F2     | F4   | 3     | 15   | 16         |    | Load3 | No   |         |
| SUBD        | F8    | F6     | F2   | 4     | 7    | 8          |    |       |      |         |
| DIVD        | F10   | F0     | F6   | 5     | 56   |            |    |       |      |         |
| ADDD        | F6    | F8     | F2   | 6     | 10   | 11         |    |       |      |         |
| Reservation | on St | ations |      |       | S1   | <i>S</i> 2 | RS | RS    |      |         |
|             | Time  | Name   | Busy | Op    | Vj   | Vk         | Qj | Qk    |      |         |
|             |       | Add1   | No   |       |      |            |    |       |      |         |
|             |       | Add2   | No   |       |      |            |    |       |      |         |
|             |       | Add3   | No   |       |      |            |    |       |      |         |
|             |       | Mult1  | No   |       |      |            |    |       |      |         |
|             | C     | Mult2  | Yes  | DIVD  | M*F4 | M(A1)      |    |       |      |         |

#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30

56 FU M\*F4 M(A2) (M-M+N (M-M) Mult2

Mult2 is completing; what is waiting for it?



#### Register result status:

• Once again: In-order issue, out-of-order execution and completion.

cs252-S12, Lecture07

### **Compare to Scoreboard Cycle 62**

| Instruction | n sta | tus:      |                  |     |    | Read | Exec | Write | ?  |
|-------------|-------|-----------|------------------|-----|----|------|------|-------|----|
| Instructio  | n     | $\dot{j}$ | $\boldsymbol{k}$ | Iss | ue | Oper | Comp | Resul | lt |
| LD          | F6    | 34+       | R2               |     | 1  | 2    | 3    | 4     |    |
| LD          | F2    | 45+       | R3               |     | 5  | 6    | 7    | 8     |    |
| MULTD       | F0    | F2        | F4               | Ш   | 6  | 9    | 19   | 20    |    |
| SUBD        | F8    | F6        | F2               |     | 7  | 9    | 11   | 12    |    |
| DIVD        | F10   | F0        | F6               |     | 8  | 21   | 61   | 62    |    |
| ADDD        | F6    | F8        | F2               | 1   | 13 | 14   | 16   | 22    |    |

|      | Exec   | Write  |
|------|--------|--------|
| Issu | e Comp | Result |
| 1    | 3      | 4      |
| 2    | 4      | 5      |
| 3    | 15     | 16     |
| 4    | 7      | 8      |
| 5    | 56     | 57     |
| 6    | 10     | 11     |

- · Why take longer on scoreboard/6600?
  - · Structural Hazards
  - Lack of forwarding

## Tomasulo v. Scoreboard (IBM 360/91 v. CDC 6600)

Pipelined Functional Units

(6 load, 3 store, 3 +, 2 x/÷)

window size: ≤ 14 instructions

No issue on structural hazard

WAR: renaming avoids

**WAW:** renaming avoids

**Broadcast results from FU** 

**Control: reservation stations** 

**Multiple Functional Units** 

(1 load/store,  $1 + , 2 \times , 1 \div$ )

≤ 5 instructions

same

stall completion

stall issue

Write/read registers

central scoreboard

### Recall: Unrolled Loop That Minimizes Stalls

```
Loop: LD
               F0,0(R1)

    What assumptions

2
               F6, -8(R1)
        LD
                                         made when moved
3
               F10,-16(R1)
        LD
                                        code?
4
        \mathbf{L}\mathbf{D}
               F14, -24(R1)
5
        ADDD
               F4,F0,F2

    OK to move store past

6
               F8, F6, F2
        ADDD
                                            SUBI even though changes
        ADDD
               F12,F10,F2
                                            register
8
        ADDD
               F16,F14,F2

    OK to move loads before

9
               0(R1),F4
        SD
                                            stores: get right data?
10
                -8 (R1),F8
        SD

    When is it safe for

11
                -16(R1),F12
        SD
                                            compiler to do such
12
        SUBI
               R1,R1,#32
                                            changes?
13
               R1,LOOP
        BNEZ
14
        SD
                8 (R1), F16
                              : 8-32 = -24
```

#### 14 clock cycles, or 3.5 per iteration

#### **Tomasulo Loop Example**

| Loop: | LD    | FO | 0    | R1 |
|-------|-------|----|------|----|
|       | MULTD | F4 | FO   | F2 |
|       | SD    | F4 | 0    | R1 |
|       | SUBI  | R1 | R1   | #8 |
|       | BNEZ  | R1 | Loop |    |

- Assume Multiply takes 4 clocks
- Assume first load takes 8 clocks (cache miss), second load takes 1 clock (hit)
- To be clear, will show clocks for SUBI, BNEZ
- Reality: integer instructions ahead

## **Loop Example**

| Instructi | on statu  | <i>s:</i> |    |            |           | Exec       | Write  |             |            |            |            |
|-----------|-----------|-----------|----|------------|-----------|------------|--------|-------------|------------|------------|------------|
| ITER      | Instructi | ion       | j  | k          | Issue     | Comp       | Result |             | Busy       | Addr       | Fu         |
| 1         | LD        | F0        | 0  | <b>R</b> 1 |           |            |        | Load1       | No         |            |            |
| 1         | MULTD     | F4        | F0 | F2         |           |            |        | Load2       | No         |            |            |
| 1         | SD        | F4        | 0  | <b>R</b> 1 |           |            |        | Load3       | No         |            |            |
| 2         | LD        | F0        | 0  | <b>R</b> 1 |           |            |        | Store 1     | No         |            |            |
| 2         | MULTD     | F4        | FO | F2         |           |            |        | Store2      | No         |            |            |
| 2         | SD        | F4        | 0  | <b>R</b> 1 |           |            |        | Store3      | No         |            |            |
| Reserva   | tion Stat | ions:     |    |            | <i>S1</i> | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name      | Busy      | Op | Vj         | Vk        | Qj         | Qk     | Code:       |            |            |            |
|           | Add1      | No        |    |            |           |            |        | LD          | F0         | 0          | <b>R</b> 1 |
|           | Add2      | No        |    |            |           |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3      | No        |    |            |           |            |        | SD          | F4         | 0          | <b>R</b> 1 |
|           | Mult1     | No        |    |            |           |            |        | <b>SUBI</b> | R1         | <b>R</b> 1 | #8         |
|           | Mult2     | No        |    |            |           |            |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result si | tatus     |    |            |           |            |        |             |            |            |            |
| Clock     | R1        |           | F0 | <i>F</i> 2 | <i>F4</i> | <i>F6</i>  | F8     | F10         | <i>F12</i> | •••        | F30        |
| 0         | 80        | Fu        |    |            |           |            |        |             |            |            |            |

## **Loop Example Cycle 1**

| Instructi | on statu  | s:    |       |            |           | Ехес       | Write  |             |            |            |            |
|-----------|-----------|-------|-------|------------|-----------|------------|--------|-------------|------------|------------|------------|
| ITER      | Instructi | ion   | j     | k          | Issue     | Comp       | Result |             | Busy       | Addr       | Fu         |
| 1         | LD        | F0    | 0     | <b>R</b> 1 | 1         |            |        | Load1       | Yes        | 80         |            |
| 1         | MULTD     | F4    | F0    | F2         |           |            |        | Load2       | No         |            |            |
| 1         | SD        | F4    | 0     | <b>R</b> 1 |           |            |        | Load3       | No         |            |            |
| 2         | LD        | F0    | 0     | <b>R</b> 1 |           |            |        | Store 1     | No         |            |            |
| 2         | MULTD     | F4    | F0    | F2         |           |            |        | Store2      | No         |            |            |
| 2         | SD        | F4    | 0     | <b>R</b> 1 |           |            |        | Store3      | No         |            |            |
| Reservat  | tion Stat | ions: |       |            | <i>S1</i> | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name      | Busy  | Op    | Vj         | Vk        | Qj         | Qk     | Code:       |            |            |            |
|           | Add1      | No    |       |            |           |            |        | LD          | F0         | 0          | <b>R</b> 1 |
|           | Add2      | No    |       |            |           |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3      | No    |       |            |           |            |        | SD          | F4         | 0          | <b>R</b> 1 |
|           | Mult1     | No    |       |            |           |            |        | SUBI        | <b>R</b> 1 | <b>R</b> 1 | #8         |
|           | Mult2     | No    |       |            |           |            |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result st | tatus |       |            |           |            |        |             |            |            |            |
| Clock     | R1        |       | F0    | <i>F</i> 2 | <i>F4</i> | <i>F6</i>  | F8     | F10         | F12        | •••        | F30        |
| 1         | 80        | Fu    | Load1 |            |           |            |        |             |            |            |            |

## **Loop Example Cycle 2**

| Instructi | on statu  | s:    |       |                  |           | Exec       | Write  |             |            |            |            |
|-----------|-----------|-------|-------|------------------|-----------|------------|--------|-------------|------------|------------|------------|
| ITER      | Instructi | ion   | j     | $\boldsymbol{k}$ | Issue     | Comp       | Result |             | Busy       | Addr       | Fu         |
| 1         | LD        | F0    | 0     | R1               | 1         |            |        | Load1       | Yes        | 80         |            |
| 1         | MULTD     | F4    | F0    | F2               | 2         |            |        | Load2       | No         |            |            |
| 1         | SD        | F4    | 0     | <b>R</b> 1       |           |            |        | Load3       | No         |            |            |
| 2         | LD        | F0    | 0     | <b>R</b> 1       |           |            |        | Store1      | No         |            |            |
| 2         | MULTD     | F4    | FO    | F2               |           |            |        | Store2      | No         |            |            |
| 2         | SD        | F4    | 0     | <b>R</b> 1       |           |            |        | Store3      | No         |            |            |
| Reserva   | tion Stat | ions: |       |                  | <i>S1</i> | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name      | Busy  | Ор    | Vj               | Vk        | Qj         | Qk     | Code:       |            |            |            |
|           | Add1      | No    |       |                  |           |            |        | LD          | F0         | 0          | R1         |
|           | Add2      | No    |       |                  |           |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3      | No    |       |                  |           |            |        | SD          | F4         | 0          | <b>R</b> 1 |
|           | Mult1     | Yes   | Multd |                  | R(F4)     | Load1      |        | SUBI        | R1         | <b>R</b> 1 | #8         |
|           | Mult2     | No    |       |                  |           |            |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result st | atus  |       |                  |           |            |        |             |            |            |            |
| Clock     | R1        |       | F0    | <i>F</i> 2       | <i>F4</i> | <i>F6</i>  | F8     | F10         | F12        | •••        | F30        |
| 2         | 80        | Fu    | Load1 |                  | Mult1     |            |        |             |            |            |            |



### Implicit renaming sets up "DataFlow" graph

2/08/2012

| Instructi | on statu  | s:    |       |            |           | Exec       | Write  |             |            |            |            |
|-----------|-----------|-------|-------|------------|-----------|------------|--------|-------------|------------|------------|------------|
| ITER      | Instructi | ion   | j     | k          | Issue     | Compl      | Result |             | Busy       | Addr       | Fu         |
| 1         | LD        | F0    | 0     | <b>R</b> 1 | 1         |            |        | Load1       | Yes        | 80         |            |
| 1         | MULTD     | F4    | F0    | F2         | 2         |            |        | Load2       | No         |            |            |
| 1         | SD        | F4    | 0     | <b>R</b> 1 | 3         |            |        | Load3       | No         |            |            |
| 2         | LD        | F0    | 0     | <b>R</b> 1 |           |            |        | Store1      | Yes        | 80         | Mult1      |
| 2         | MULTD     | F4    | F0    | F2         |           |            |        | Store2      | No         |            |            |
| 2         | SD        | F4    | 0     | <b>R</b> 1 |           |            |        | Store3      | No         |            |            |
| Reservat  | tion Stat | ions: |       |            | S1        | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name      | Busy  | Op    | Vj         | Vk        | Qj         | Qk     | Code:       |            |            |            |
|           | Add1      | No    |       |            |           |            |        | LD          | FO         | 0          | <b>R</b> 1 |
|           | Add2      | No    |       |            |           |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3      | No    |       |            |           |            |        | SD          | F4         | 0          | <b>R</b> 1 |
|           | Mult1     | Yes   | Multd |            | R(F4)     | Load1      |        | SUBI        | <b>R</b> 1 | <b>R</b> 1 | #8         |
|           | Mult2     | No    |       |            |           |            |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result st | atus  |       |            |           |            |        |             |            |            |            |
| Clock     | R1        |       | F0    | <i>F</i> 2 | <i>F4</i> | <i>F6</i>  | F8     | F10         | <i>F12</i> | •••        | F30        |
| 4         | 80        | Fu    | Load1 |            | Mult1     |            |        |             |            |            |            |

### Dispatching SUBI Instruction

| Instructi | on statu  | s:    |       |                  |           | Exec       | Write  |             |            |            |            |
|-----------|-----------|-------|-------|------------------|-----------|------------|--------|-------------|------------|------------|------------|
| ITER      | Instructi | on    | j     | $\boldsymbol{k}$ | Issue     | Comp       | Result |             | Busy       | Addr       | Fu         |
| 1         | LD        | F0    | 0     | R1               | 1         |            |        | Load1       | Yes        | 80         |            |
| 1         | MULTD     | F4    | F0    | F2               | 2         |            |        | Load2       | No         |            |            |
| 1         | SD        | F4    | 0     | <b>R</b> 1       | 3         |            |        | Load3       | No         |            |            |
| 2         | LD        | F0    | 0     | <b>R</b> 1       |           |            |        | Store1      | Yes        | 80         | Mult1      |
| 2         | MULTD     | F4    | FO    | F2               |           |            |        | Store2      | No         |            |            |
| 2         | SD        | F4    | 0     | <b>R</b> 1       |           |            |        | Store3      | No         |            |            |
| Reservat  | tion Stat | ions: |       |                  | <i>S1</i> | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name      | Busy  | Op    | Vj               | Vk        | Qj         | Qk     | Code:       |            |            |            |
|           | Add1      | No    |       |                  |           |            |        | LD          | F0         | 0          | <b>R</b> 1 |
|           | Add2      | No    |       |                  |           |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3      | No    |       |                  |           |            |        | SD          | F4         | 0          | R1         |
|           | Mult1     | Yes   | Multd |                  | R(F4)     | Load1      |        | SUBI        | R1         | <b>R</b> 1 | #8         |
|           | Mult2     | No    |       |                  |           |            |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result st | atus  |       |                  |           |            |        |             |            |            |            |
| Clock     | R1        |       | F0    | <i>F</i> 2       | <i>F4</i> | <i>F6</i>  | F8     | F10         | F12        | •••        | F30        |
| 5         | 72        | Fu    | Load1 |                  | Mult1     |            |        |             |            |            |            |

### And, BNEZ instruction

| Instructi | on statu  | s:    |       |                  |           | Ехес       | Write  |             |            |            |            |
|-----------|-----------|-------|-------|------------------|-----------|------------|--------|-------------|------------|------------|------------|
| ITER      | Instructi | ion   | j     | $\boldsymbol{k}$ | Issue     | Comp       | Result |             | Busy       | Addr       | Fu         |
| 1         | LD        | F0    | 0     | <b>R</b> 1       | 1         |            |        | Load1       | Yes        | 80         |            |
| 1         | MULTD     | F4    | F0    | F2               | 2         |            |        | Load2       | Yes        | 72         |            |
| 1         | SD        | F4    | 0     | <b>R</b> 1       | 3         |            |        | Load3       | No         |            |            |
| 2         | LD        | F0    | 0     | R1               | 6         |            |        | Store1      | Yes        | 80         | Mult1      |
| 2         | MULTD     | F4    | F0    | F2               |           |            |        | Store2      | No         |            |            |
| 2         | SD        | F4    | 0     | R1               |           |            |        | Store3      | No         |            |            |
| Reserva   | tion Stat | ions: |       |                  | <i>S1</i> | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name      | Busy  | Op    | Vj               | Vk        | Qj         | Qk     | Code:       |            |            |            |
|           | Add1      | No    |       |                  |           |            |        | LD          | F0         | 0          | <b>R</b> 1 |
|           | Add2      | No    |       |                  |           |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3      | No    |       |                  |           |            |        | SD          | F4         | 0          | <b>R</b> 1 |
|           | Mult1     | Yes   | Multd |                  | R(F4)     | Load1      |        | SUBI        | R1         | <b>R</b> 1 | #8         |
|           | Mult2     | No    |       |                  |           |            |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result st | tatus |       |                  |           |            |        |             |            |            |            |
| Clock     | R1        |       | F0    | <i>F</i> 2       | F4        | F6         | F8     | F10         | F12        | •••        | F30        |
| 6         | 72        | Fu    | Load2 |                  | Mult1     |            |        |             |            |            |            |

Notice that F0 never sees Load from location 80

| Instructi | on statu  | s:    |       |                  |           | Exec       | Write  |             |            |            |            |
|-----------|-----------|-------|-------|------------------|-----------|------------|--------|-------------|------------|------------|------------|
| ITER      | Instructi | ion   | j     | $\boldsymbol{k}$ | Issue     | Compl      | Result |             | Busy       | Addr       | Fu         |
| 1         | LD        | F0    | 0     | R1               | 1         |            |        | Load1       | Yes        | 80         |            |
| 1         | MULTD     | F4    | F0    | F2               | 2         |            |        | Load2       | Yes        | 72         |            |
| 1         | SD        | F4    | 0     | <b>R</b> 1       | 3         |            |        | Load3       | No         |            |            |
| 2         | LD        | F0    | 0     | <b>R</b> 1       | 6         |            |        | Store1      | Yes        | 80         | Mult1      |
| 2         | MULTD     | F4    | FO    | <b>F2</b>        | 7         |            |        | Store2      | No         |            |            |
| 2         | SD        | F4    | 0     | <b>R</b> 1       |           |            |        | Store3      | No         |            |            |
| Reserva   | tion Stat | ions: |       |                  | S1        | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name      | Busy  | Op    | Vj               | Vk        | Qj         | Qk     | Code:       |            |            |            |
|           | Add1      | No    |       |                  |           |            |        | LD          | F0         | 0          | <b>R</b> 1 |
|           | Add2      | No    |       |                  |           |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3      | No    |       |                  |           |            |        | SD          | F4         | 0          | <b>R</b> 1 |
|           | Mult1     | Yes   | Multd |                  | R(F2)     | Load1      |        | SUBI        | R1         | <b>R</b> 1 | #8         |
|           | Mult2     | Yes   | Multd |                  | R(F2)     | Load2      |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result st | tatus |       |                  |           |            |        |             |            |            |            |
| Clock     | R1        |       | F0    | <i>F</i> 2       | <i>F4</i> | <i>F6</i>  | F8     | F10         | <i>F12</i> | •••        | F30        |
| 7         | 72        | Fu    | Load2 |                  | Mult2     |            |        |             |            |            |            |

- Register file completely detached from computation
- First and Second iteration completely overlapped cs252-S12, Lecture 7

| Instructi | on statu  | s:    |       |            |           | Ехес       | Write  |             |            |            |       |
|-----------|-----------|-------|-------|------------|-----------|------------|--------|-------------|------------|------------|-------|
| ITER      | Instructi | ion   | j     | k          | Issue     | Compl      | Result |             | Busy       | Addr       | Fu    |
| 1         | LD        | F0    | 0     | R1         | 1         |            |        | Load1       | Yes        | 80         |       |
| 1         | MULTD     | F4    | F0    | F2         | 2         |            |        | Load2       | Yes        | 72         |       |
| 1         | SD        | F4    | 0     | R1         | 3         |            |        | Load3       | No         |            |       |
| 2         | LD        | F0    | 0     | R1         | 6         |            |        | Store1      | Yes        | 80         | Mult1 |
| 2         | MULTD     | F4    | F0    | F2         | 7         |            |        | Store2      | Yes        | 72         | Mult2 |
| 2         | SD        | F4    | 0     | <b>R</b> 1 | 8         |            |        | Store3      | No         |            |       |
| Reservat  | tion Stat | ions: |       |            | <i>S1</i> | <i>S</i> 2 | RS     |             |            |            |       |
| Time      | Name      | Busy  | Op    | Vj         | Vk        | Qj         | Qk     | Code:       |            |            |       |
|           | Add1      | No    |       |            |           |            |        | LD          | F0         | 0          | R1    |
|           | Add2      | No    |       |            |           |            |        | MULTD       | F4         | F0         | F2    |
|           | Add3      | No    |       |            |           |            |        | SD          | F4         | 0          | R1    |
|           | Mult1     | Yes   | Multd |            | R(F2)     | Load1      |        | SUBI        | <b>R</b> 1 | <b>R</b> 1 | #8    |
|           | Mult2     | Yes   | Multd |            | R(F2)     | Load2      |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |       |
| Register  | result st | atus  |       |            |           |            |        |             |            |            |       |
| Clock     | R1        |       | F0    | <i>F</i> 2 | F4        | F6         | F8     | F10         | F12        | •••        | F30   |
| 8         | 72        | Fu    | Load2 |            | Mult2     |            |        |             |            |            |       |

| Instructi | on statu              | <i>s</i> : |           |            |           | Exec       | Write  |             |            |            |            |
|-----------|-----------------------|------------|-----------|------------|-----------|------------|--------|-------------|------------|------------|------------|
| ITER      | Instructi             | on         | j         | k          | Issue     | Compl      | Result |             | Busy       | Addr       | Fu         |
| 1         | LD                    | F0         | 0         | R1         | 1         | 9          |        | Load1       | Yes        | 80         |            |
| 1         | MULTD                 | F4         | F0        | F2         | 2         |            |        | Load2       | Yes        | 72         |            |
| 1         | SD                    | F4         | 0         | <b>R</b> 1 | 3         |            |        | Load3       | No         |            |            |
| 2         | LD                    | F0         | 0         | R1         | 6         |            |        | Store1      | Yes        | 80         | Mult1      |
| 2         | MULTD                 | F4         | F0        | F2         | 7         |            |        | Store2      | Yes        | 72         | Mult2      |
| 2         | SD                    | F4         | 0         | R1         | 8         |            |        | Store3      | No         |            |            |
| Reservat  | Reservation Stations: |            |           |            | <i>S1</i> | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name                  | Busy       | Op        | Vj         | Vk        | Qj         | Qk     | Code:       |            |            |            |
|           | Add1                  | No         |           |            |           |            |        | LD          | F0         | 0          | <b>R</b> 1 |
|           | Add2                  | No         |           |            |           |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3                  | No         |           |            |           |            |        | SD          | F4         | 0          | <b>R</b> 1 |
|           | Mult1                 | Yes        | Multd     |            | R(F2)     | Load1      |        | <b>SUBI</b> | <b>R</b> 1 | <b>R</b> 1 | #8         |
|           | Mult2                 | Yes        | Multd     |            | R(F2)     | Load2      |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result st             | atus       |           |            |           |            |        |             |            |            |            |
| Clock     | R1                    |            | <i>F0</i> | <i>F</i> 2 | <i>F4</i> | <i>F6</i>  | F8     | F10         | F12        | •••        | F30        |
| 9         | 72                    | Fu         | Load2     |            | Mult2     |            |        |             |            |            |            |

Load1 completing: who is waiting?

Note: Dispatching SUBI cs252-S12, Lecture07

| Instructi | on statu  | <b>s:</b> |           |            |               | Exec       | Write  |             |            |            |            |
|-----------|-----------|-----------|-----------|------------|---------------|------------|--------|-------------|------------|------------|------------|
| ITER      | Instructi | on        | $\dot{j}$ | k          | Issue         | Compl      | Result |             | Busy       | Addr       | Fu         |
| 1         | LD        | F0        | 0         | <b>R</b> 1 | 1             | 9          | 10     | Load1       | No         |            |            |
| 1         | MULTD     | F4        | F0        | F2         | 2             |            |        | Load2       | Yes        | 72         |            |
| 1         | SD        | F4        | 0         | <b>R</b> 1 | 3             |            |        | Load3       | No         |            |            |
| 2         | LD        | F0        | 0         | <b>R</b> 1 | 6             | 10         |        | Store1      | Yes        | 80         | Mult1      |
| 2         | MULTD     | F4        | F0        | F2         | 7             |            |        | Store2      | Yes        | 72         | Mult2      |
| 2         | SD        | F4        | 0         | <b>R</b> 1 | 8             |            |        | Store3      | No         |            |            |
| Reservat  | tion Stat | ions:     |           |            | S1            | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name      | Busy      | Op        | Vj         | Vk            | Qj         | Qk     | Code:       |            |            |            |
|           | Add1      | No        |           |            |               |            |        | LD          | F0         | 0          | <b>R</b> 1 |
|           | Add2      | No        |           |            |               |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3      | No        |           |            |               |            |        | SD          | F4         | 0          | <b>R</b> 1 |
| 4         | Mult1     | Yes       | Multd     | M[80]      | R(F2)         |            |        | SUBI        | R1         | <b>R</b> 1 | #8         |
|           | Mult2     | Yes       | Multd     |            | <b>R</b> (F2) | Load2      |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result st | atus      |           |            |               |            |        |             |            |            |            |
| Clock     | R1        |           | F0        | <i>F</i> 2 | <i>F4</i>     | <i>F6</i>  | F8     | F10         | F12        | • • •      | F30        |
| 10        | 64        | Fu        | Load2     |            | Mult2         |            |        |             |            |            |            |

Load2 completing: who is waiting?

• Note: Dispatching BNEZ cs252-S12, Lecture07

| Instructi | on statu  | s:    |       |            |               | Exec       | Write  |             |            |            |            |
|-----------|-----------|-------|-------|------------|---------------|------------|--------|-------------|------------|------------|------------|
| ITER      | Instructi | on    | j     | k          | Issue         | Comp       | Result |             | Busy       | Addr       | Fu         |
| 1         | LD        | F0    | 0     | <b>R</b> 1 | 1             | 9          | 10     | Load1       | No         |            |            |
| 1         | MULTD     | F4    | F0    | F2         | 2             |            |        | Load2       | No         |            |            |
| 1         | SD        | F4    | 0     | <b>R</b> 1 | 3             |            |        | Load3       | Yes        | 64         |            |
| 2         | LD        | F0    | O     | <b>R</b> 1 | 6             | 10         | 11     | Store1      | Yes        | 80         | Mult1      |
| 2         | MULTD     | F4    | FO    | F2         | 7             |            |        | Store2      | Yes        | 72         | Mult2      |
| 2         | SD        | F4    | 0     | <b>R</b> 1 | 8             |            |        | Store3      | No         |            |            |
| Reservat  | tion Stat | ions: |       |            | S1            | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name      | Busy  | Op    | Vj         | Vk            | Qj         | Qk     | Code:       |            |            |            |
|           | Add1      | No    |       |            |               |            |        | LD          | F0         | 0          | <b>R</b> 1 |
|           | Add2      | No    |       |            |               |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3      | No    |       |            |               |            |        | SD          | F4         | 0          | R1         |
| 3         | Mult1     | Yes   | Multd | M[80]      | R(F2)         |            |        | SUBI        | R1         | <b>R</b> 1 | #8         |
| 4         | Mult2     | Yes   | Multd | M[72]      | <b>R</b> (F2) |            |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result st | atus  |       |            |               |            |        |             |            |            |            |
| Clock     | R1        |       | F0    | <i>F</i> 2 | <i>F4</i>     | <i>F6</i>  | F8     | F10         | F12        | •••        | F30        |
| 11        | 64        | Fu    | Load3 |            | Mult2         |            |        |             |            |            |            |

### Next load in sequence

| Instructi | on statu. | s:    |           |            |           | Exec       | Write  |             |            |            |            |
|-----------|-----------|-------|-----------|------------|-----------|------------|--------|-------------|------------|------------|------------|
| ITER      | Instructi | on    | $\dot{j}$ | k          | Issue     | Comp       | Result |             | Busy       | Addr       | Fu         |
| 1         | LD        | F0    | 0         | <b>R</b> 1 | 1         | 9          | 10     | Load1       | No         |            |            |
| 1         | MULTD     | F4    | F0        | F2         | 2         |            |        | Load2       | No         |            |            |
| 1         | SD        | F4    | 0         | <b>R</b> 1 | 3         |            |        | Load3       | Yes        | 64         |            |
| 2         | LD        | F0    | 0         | <b>R</b> 1 | 6         | 10         | 11     | Store1      | Yes        | 80         | Mult1      |
| 2         | MULTD     | F4    | F0        | F2         | 7         |            |        | Store2      | Yes        | 72         | Mult2      |
| 2         | SD        | F4    | 0         | <b>R</b> 1 | 8         |            |        | Store3      | No         |            |            |
| Reservat  | tion Stat | ions: |           |            | S1        | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name      | Busy  | Op        | Vj         | Vk        | Qj         | Qk     | Code:       |            |            |            |
|           | Add1      | No    |           |            |           |            |        | LD          | F0         | 0          | <b>R</b> 1 |
|           | Add2      | No    |           |            |           |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3      | No    |           |            |           |            |        | SD          | F4         | 0          | <b>R</b> 1 |
| 2         | Mult1     | Yes   | Multd     | M[80]      | R(F2)     |            |        | SUBI        | <b>R</b> 1 | <b>R</b> 1 | #8         |
| 3         | Mult2     | Yes   | Multd     | M[72]      | R(F2)     |            |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result st | atus  |           |            |           |            |        |             |            |            |            |
| Clock     | R1        |       | <i>F0</i> | <i>F</i> 2 | <i>F4</i> | <i>F6</i>  | F8     | F10         | F12        | •••        | F30        |
| 12        | 64        | Fu    | Load3     |            | Mult2     |            |        |             |            |            |            |

### Why not issue third multiply?

|           |           |       | _     |                  |           |            |        |             |            |            |            |
|-----------|-----------|-------|-------|------------------|-----------|------------|--------|-------------|------------|------------|------------|
| Instructi | on statu  | s:    |       |                  |           | Ехес       | Write  |             |            |            |            |
| ITER      | Instructi | on    | j     | $\boldsymbol{k}$ | Issue     | Comp       | Result |             | Busy       | Addr       | Fu         |
| 1         | LD        | F0    | 0     | <b>R</b> 1       | 1         | 9          | 10     | Load1       | No         |            |            |
| 1         | MULTD     | F4    | F0    | F2               | 2         |            |        | Load2       | No         |            |            |
| 1         | SD        | F4    | 0     | <b>R</b> 1       | 3         |            |        | Load3       | Yes        | 64         |            |
| 2         | LD        | F0    | 0     | <b>R</b> 1       | 6         | 10         | 11     | Store1      | Yes        | 80         | Mult1      |
| 2         | MULTD     | F4    | FO    | F2               | 7         |            |        | Store2      | Yes        | 72         | Mult2      |
| 2         | SD        | F4    | 0     | <b>R</b> 1       | 8         |            |        | Store3      | No         |            |            |
| Reservat  | tion Stat | ions: |       |                  | <i>S1</i> | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name      | Busy  | Op    | Vj               | Vk        | Qj         | Qk     | Code:       |            |            |            |
|           | Add1      | No    |       |                  |           |            |        | LD          | F0         | 0          | <b>R</b> 1 |
|           | Add2      | No    |       |                  |           |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3      | No    |       |                  |           |            |        | SD          | F4         | 0          | <b>R</b> 1 |
| 1         | Mult1     | Yes   | Multd | M[80]            | R(F2)     |            |        | SUBI        | R1         | <b>R</b> 1 | #8         |
| 2         | Mult2     | Yes   | Multd | M[72]            | R(F2)     |            |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result st | atus  |       |                  |           |            |        |             |            |            |            |
| Clock     | R1        |       | F0    | <i>F</i> 2       | <i>F4</i> | F6         | F8     | F10         | F12        | •••        | F30        |
| 13        | 64        | Fu    | Load3 |                  | Mult2     |            |        |             |            |            |            |

| Instructi | on statu. | s:    |           |            |           | Exec       | Write  |             |            |            |            |
|-----------|-----------|-------|-----------|------------|-----------|------------|--------|-------------|------------|------------|------------|
| ITER      | Instructi | on    | $\dot{j}$ | k          | Issue     | Comp       | Result |             | Busy       | Addr       | Fu         |
| 1         | LD        | F0    | 0         | <b>R</b> 1 | 1         | 9          | 10     | Load1       | No         |            |            |
| 1         | MULTD     | F4    | F0        | F2         | 2         | 14         |        | Load2       | No         |            |            |
| 1         | SD        | F4    | 0         | R1         | 3         |            |        | Load3       | Yes        | 64         |            |
| 2         | LD        | F0    | 0         | <b>R</b> 1 | 6         | 10         | 11     | Store1      | Yes        | 80         | Mult1      |
| 2         | MULTD     | F4    | F0        | F2         | 7         |            |        | Store2      | Yes        | 72         | Mult2      |
| 2         | SD        | F4    | 0         | <b>R</b> 1 | 8         |            |        | Store3      | No         |            |            |
| Reservat  | tion Stat | ions: |           |            | <i>S1</i> | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name      | Busy  | Op        | Vj         | Vk        | Qj         | Qk     | Code:       |            |            |            |
|           | Add1      | No    |           |            |           |            |        | LD          | F0         | 0          | <b>R</b> 1 |
|           | Add2      | No    |           |            |           |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3      | No    |           |            |           |            |        | SD          | F4         | 0          | <b>R</b> 1 |
| 0         | Mult1     | Yes   | Multd     | M[80]      | R(F2)     |            |        | SUBI        | <b>R</b> 1 | <b>R</b> 1 | #8         |
| 1         | Mult2     | Yes   | Multd     | M[72]      | R(F2)     |            |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result st | atus  |           |            |           |            |        |             |            |            |            |
| Clock     | R1        |       | FO        | <i>F</i> 2 | <i>F4</i> | F6         | F8     | F10         | F12        | •••        | F30        |
| 14        | 64        | Fu    | Load3     |            | Mult2     |            |        |             |            |            |            |

### Mult1 completing. Who is waiting?

| Instructi | on statu  | s:    |           |            |           | Exec       | Write  |             |            |            |            |
|-----------|-----------|-------|-----------|------------|-----------|------------|--------|-------------|------------|------------|------------|
| ITER      | Instructi | on    | $\dot{j}$ | k          | Issue     | Comp       | Result |             | Busy       | Addr       | Fu         |
| 1         | LD        | F0    | 0         | R1         | 1         | 9          | 10     | Load1       | No         |            |            |
| 1         | MULTD     | F4    | F0        | F2         | 2         | 14         | 15     | Load2       | No         |            |            |
| 1         | SD        | F4    | 0         | <b>R</b> 1 | 3         |            |        | Load3       | Yes        | 64         |            |
| 2         | LD        | F0    | 0         | <b>R</b> 1 | 6         | 10         | 11     | Store1      | Yes        | 80         | [80]*R2    |
| 2         | MULTD     | F4    | F0        | F2         | 7         | 15         |        | Store2      | Yes        | 72         | Mult2      |
| 2         | SD        | F4    | 0         | <b>R</b> 1 | 8         |            |        | Store3      | No         |            |            |
| Reservat  | tion Stat | ions: |           |            | <i>S1</i> | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name      | Busy  | Op        | Vj         | Vk        | Qj         | Qk     | Code:       |            |            |            |
|           | Add1      | No    |           |            |           |            |        | LD          | F0         | 0          | <b>R</b> 1 |
|           | Add2      | No    |           |            |           |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3      | No    |           |            |           |            |        | SD          | F4         | 0          | R1         |
|           | Mult1     | No    |           |            |           |            |        | SUBI        | R1         | <b>R</b> 1 | #8         |
| 0         | Mult2     | Yes   | Multd     | M[72]      | R(F2)     |            |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result st | atus  |           |            |           |            |        |             |            |            |            |
| Clock     | R1        |       | FO        | <i>F</i> 2 | <i>F4</i> | <i>F6</i>  | F8     | F10         | F12        | •••        | F30        |
| 15        | 64        | Fu    | Load3     |            | Mult2     |            |        |             |            |            |            |

### Mult2 completing. Who is waiting?

| Instructi | on statu  | s:    |       |            |           | Ехес       | Write  |             |            |            |         |
|-----------|-----------|-------|-------|------------|-----------|------------|--------|-------------|------------|------------|---------|
| ITER      | Instructi | on    | j     | k          | Issue     | Comp       | Result |             | Busy       | Addr       | Fu      |
| 1         | LD        | F0    | 0     | R1         | 1         | 9          | 10     | Load1       | No         |            | ]       |
| 1         | MULTD     | F4    | F0    | F2         | 2         | 14         | 15     | Load2       | No         |            |         |
| 1         | SD        | F4    | 0     | R1         | 3         |            |        | Load3       | Yes        | 64         |         |
| 2         | LD        | F0    | 0     | R1         | 6         | 10         | 11     | Store1      | Yes        | 80         | [80]*R2 |
| 2         | MULTD     | F4    | F0    | F2         | 7         | 15         | 16     | Store2      | Yes        | 72         | [72]*R2 |
| 2         | SD        | F4    | 0     | R1         | 8         |            |        | Store3      | No         |            |         |
| Reserva   | tion Stat | ions: |       |            | <i>S1</i> | <i>S</i> 2 | RS     |             |            |            |         |
| Time      | Name      | Busy  | Op    | Vj         | Vk        | Qj         | Qk     | Code:       |            |            |         |
|           | Add1      | No    |       |            |           |            |        | LD          | F0         | 0          | R1      |
|           | Add2      | No    |       |            |           |            |        | MULTD       | F4         | F0         | F2      |
|           | Add3      | No    |       |            |           |            |        | SD          | F4         | 0          | R1      |
|           | Mult1     | Yes   | Multd |            | R(F2)     | Load3      |        | SUBI        | R1         | <b>R</b> 1 | #8      |
|           | Mult2     | No    |       |            |           |            |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |         |
| Register  | result st | atus  |       |            |           |            |        |             |            |            |         |
| Clock     | R1        |       | FO    | <i>F</i> 2 | <i>F4</i> | <i>F6</i>  | F8     | F10         | F12        | •••        | F30     |
| 16        | 64        | Fu    | Load3 |            | Mult1     |            |        |             |            |            |         |

| Instructi | on statu  | s:    |       |                  |           | Ехес       | Write  |             |            |            |            |
|-----------|-----------|-------|-------|------------------|-----------|------------|--------|-------------|------------|------------|------------|
| ITER      | Instructi | on    | j     | $\boldsymbol{k}$ | Issue     | Comp       | Result |             | Busy       | Addr       | Fu         |
| 1         | LD        | F0    | 0     | R1               | 1         | 9          | 10     | Load1       | No         |            |            |
| 1         | MULTD     | F4    | F0    | F2               | 2         | 14         | 15     | Load2       | No         |            |            |
| 1         | SD        | F4    | 0     | <b>R</b> 1       | 3         |            |        | Load3       | Yes        | 64         |            |
| 2         | LD        | F0    | 0     | <b>R</b> 1       | 6         | 10         | 11     | Store1      | Yes        | 80         | [80]*R2    |
| 2         | MULTD     | F4    | F0    | F2               | 7         | 15         | 16     | Store2      | Yes        | 72         | [72]*R2    |
| 2         | SD        | F4    | 0     | <b>R</b> 1       | 8         |            |        | Store3      | Yes        | 64         | Mult1      |
| Reservat  | tion Stat | ions: |       |                  | S1        | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name      | Busy  | Op    | Vj               | Vk        | Qj         | Qk     | Code:       |            |            |            |
|           | Add1      | No    |       |                  |           |            |        | LD          | F0         | 0          | <b>R</b> 1 |
|           | Add2      | No    |       |                  |           |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3      | No    |       |                  |           |            |        | SD          | F4         | 0          | R1         |
|           | Mult1     | Yes   | Multd |                  | R(F2)     | Load3      |        | SUBI        | <b>R</b> 1 | <b>R</b> 1 | #8         |
|           | Mult2     | No    |       |                  |           |            |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result st | atus  |       |                  |           |            |        |             |            |            |            |
| Clock     | R1        |       | F0    | <i>F</i> 2       | <i>F4</i> | F6         | F8     | F10         | F12        | •••        | F30        |
| 17        | 64        | Fu    | Load3 |                  | Mult1     |            |        |             |            |            |            |

| Instructi | on statu  | s:    |       |            |           | Exec       | Write  |             |            |      |         |
|-----------|-----------|-------|-------|------------|-----------|------------|--------|-------------|------------|------|---------|
| ITER      | Instructi | ion   | j     | k          | Issue     | Comp       | Result |             | Busy       | Addr | Fu      |
| 1         | LD        | F0    | 0     | <b>R</b> 1 | 1         | 9          | 10     | Load1       | No         |      |         |
| 1         | MULTD     | F4    | F0    | F2         | 2         | 14         | 15     | Load2       | No         |      |         |
| 1         | SD        | F4    | 0     | <b>R</b> 1 | 3         | 18         |        | Load3       | Yes        | 64   |         |
| 2         | LD        | F0    | 0     | <b>R</b> 1 | 6         | 10         | 11     | Store 1     | Yes        | 80   | [80]*R2 |
| 2         | MULTD     | F4    | F0    | F2         | 7         | 15         | 16     | Store2      | Yes        | 72   | [72]*R2 |
| 2         | SD        | F4    | 0     | R1         | 8         |            |        | Store3      | Yes        | 64   | Mult1   |
| Reservat  | tion Stat | ions: |       |            | <i>S1</i> | <i>S</i> 2 | RS     |             |            |      |         |
| Time      | Name      | Busy  | Op    | Vj         | Vk        | Qj         | Qk     | Code:       |            |      |         |
|           | Add1      | No    |       |            |           |            |        | LD          | F0         | 0    | R1      |
|           | Add2      | No    |       |            |           |            |        | MULTD       | F4         | F0   | F2      |
|           | Add3      | No    |       |            |           |            |        | SD          | F4         | 0    | R1      |
|           | Mult1     | Yes   | Multd |            | R(F2)     | Load3      |        | SUBI        | R1         | R1   | #8      |
|           | Mult2     | No    |       |            |           |            |        | <b>BNEZ</b> | <b>R</b> 1 | Loop |         |
| Register  | result st | atus  |       |            |           |            |        |             |            |      |         |
| Clock     | R1        |       | F0    | <i>F</i> 2 | <i>F4</i> | F6         | F8     | F10         | F12        | •••  | F30     |
| 18        | 64        | Fu    | Load3 |            | Mult1     |            |        |             |            |      |         |

| Instructi | on statu  | s:    |       |            |           | Exec       | Write  |             |            |            |            |
|-----------|-----------|-------|-------|------------|-----------|------------|--------|-------------|------------|------------|------------|
| ITER      | Instructi | on    | j     | k          | Issue     | Comp       | Result |             | Busy       | Addr       | Fu         |
| 1         | LD        | F0    | 0     | R1         | 1         | 9          | 10     | Load1       | No         |            |            |
| 1         | MULTD     | F4    | F0    | F2         | 2         | 14         | 15     | Load2       | No         |            |            |
| 1         | SD        | F4    | 0     | <b>R</b> 1 | 3         | 18         | 19     | Load3       | Yes        | 64         |            |
| 2         | LD        | F0    | 0     | <b>R</b> 1 | 6         | 10         | 11     | Store1      | No         |            |            |
| 2         | MULTD     | F4    | F0    | F2         | 7         | 15         | 16     | Store2      | Yes        | 72         | [72]*R2    |
| 2         | SD        | F4    | 0     | <b>R</b> 1 | 8         | 19         |        | Store3      | Yes        | 64         | Mult1      |
| Reserva   | tion Stat | ions: |       |            | <i>S1</i> | <i>S</i> 2 | RS     |             |            |            |            |
| Time      | Name      | Busy  | Op    | Vj         | Vk        | Qj         | Qk     | Code:       |            |            |            |
|           | Add1      | No    |       |            |           |            |        | LD          | F0         | 0          | R1         |
|           | Add2      | No    |       |            |           |            |        | MULTD       | F4         | F0         | F2         |
|           | Add3      | No    |       |            |           |            |        | SD          | F4         | 0          | <b>R</b> 1 |
|           | Mult1     | Yes   | Multd |            | R(F2)     | Load3      |        | SUBI        | <b>R</b> 1 | <b>R</b> 1 | #8         |
|           | Mult2     | No    |       |            |           |            |        | <b>BNEZ</b> | <b>R</b> 1 | Loop       |            |
| Register  | result st | atus  |       |            |           |            |        |             |            |            |            |
| Clock     | R1        |       | F0    | <i>F</i> 2 | <i>F4</i> | <i>F6</i>  | F8     | F10         | F12        | • • •      | F30        |
| 19        | 64        | Fu    | Load3 |            | Mult1     |            |        |             |            |            |            |

| Instructi | on statu  | s:    |           |            |           | Exec       | Write  |             |            |            |       |
|-----------|-----------|-------|-----------|------------|-----------|------------|--------|-------------|------------|------------|-------|
| ITER      | Instructi | on    | j         | k          | Issue     | Comp       | Result |             | Busy       | Addr       | Fu    |
| 1         | LD        | F0    | 0         | <b>R</b> 1 | 1         | 9          | 10     | Load1       | No         |            |       |
| 1         | MULTD     | F4    | F0        | F2         | 2         | 14         | 15     | Load2       | No         |            |       |
| 1         | SD        | F4    | 0         | R1         | 3         | 18         | 19     | Load3       | Yes        | 64         |       |
| 2         | LD        | F0    | 0         | R1         | 6         | 10         | 11     | Store1      | No         |            |       |
| 2         | MULTD     | F4    | F0        | F2         | 7         | 15         | 16     | Store2      | No         |            |       |
| 2         | SD        | F4    | 0         | <b>R</b> 1 | 8         | 19         | 20     | Store3      | Yes        | 64         | Mult1 |
| Reservat  | tion Stat | ions: |           |            | S1        | <i>S</i> 2 | RS     |             |            |            |       |
| Time      | Name      | Busy  | Op        | Vj         | Vk        | Qj         | Qk     | Code:       |            |            |       |
|           | Add1      | No    |           |            |           |            |        | LD          | F0         | 0          | R1    |
|           | Add2      | No    |           |            |           |            |        | MULTD       | F4         | F0         | F2    |
|           | Add3      | No    |           |            |           |            |        | SD          | F4         | 0          | R1    |
|           | Mult1     | Yes   | Multd     |            | R(F2)     | Load3      |        | <b>SUBI</b> | <b>R</b> 1 | <b>R</b> 1 | #8    |
|           | Mult2     | No    |           |            |           |            |        | BNEZ        | R1         | Loop       |       |
| Register  | result st | atus  |           |            |           |            |        |             |            |            |       |
| Clock     | R1        |       | <i>F0</i> | <i>F</i> 2 | <i>F4</i> | F6         | F8     | F10         | F12        | •••        | F30   |
| 20        | 64        | Fu    | Load3     |            | Mult1     |            |        |             |            |            |       |

# Why can Tomasulo overlap iterations of loops?

- Register renaming
  - Multiple iterations use different physical destinations for registers (dynamic loop unrolling).
- Reservation stations
  - Permit instruction issue to advance past integer control flow operations
- Other idea: Tomasulo building dynamic "DataFlow" graph from instructions
  - Fits in with readings for Wednesday

# **Summary**

- Scoreboard: Track dependencies through reservations
  - Simple scheme for out-of-order execution
  - WAW and WAR hazards force stalls cannot handle multiple instructions with same destination register
- Reservations stations: renaming to larger set of registers + buffering source operands
  - Prevents registers as bottleneck
  - Avoids WAR, WAW hazards of Scoreboard
  - Allows loop unrolling in HW
- Dynamic hardware schemes can unroll loops dynamically in hardware
  - Form of limited dataflow
  - Register renaming is essential
- Lasting Contributions of Tomasulo Algorithm
  - Dynamic scheduling
  - Register renaming
  - Load/store disambiguation
- 360/91 descendants are Pentium II; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264

92



# **Explicit Register Renaming**

- Tomasulo provides Implicit Register Renaming
  - User registers renamed to reservation station tags
- Explicit Register Renaming:
  - Use physical register file that is larger than number of registers specified by ISA
- Keep a translation table:
  - ISA register => physical register mapping
  - When register is written, replace table entry with new register from freelist.
  - Physical register becomes free when not being used by any instructions in progress.
- Pipeline can be exactly like "standard" DLX pipeline
  - IF, ID, EX, etc....
- Advantages:
  - Removes all WAR and WAW hazards
  - Like Tomasulo, good for allowing full out-of-order completion
  - Allows data to be fetched from a single register file
  - Makes speculative execution/precise interrupts easier:
    - » All that needs to be "undone" for precise break point is to undo the table mappings







2/9/2011



95

## **Scoreboard Example**

```
Instruction status:
                                Read Exec Write
                         Issue Oper Comp Result
   Instruction
                      \boldsymbol{k}
   LD
            F6
                 34 + R2
   LD
            F2
                 45+ R3
            F0
   MULTD
                 F2
                     F4
   SUBD
            F8
                     F2
   DIVD
           F10
                 F0
                     F6
   ADDD
            F6
                 F8
                     F2
```

#### Functional unit status:

| Time Name | Busy | Op | Fi | Fj | Fk | Qj | Qk | Řj | Rk |
|-----------|------|----|----|----|----|----|----|----|----|
| Int1      | No   |    |    |    |    |    |    |    |    |
| Int2      | No   |    |    |    |    |    |    |    |    |
| Mult1     | No   |    |    |    |    |    |    |    |    |
| Add       | No   |    |    |    |    |    |    |    |    |
| Divide    | No   |    |    |    |    |    |    |    |    |

SI

*S*2

FU FU

Fi?

Fk?

#### Register Rename and Result

Clock F2F4*F*6 F8 F10 F12 F30 F0P0 P30 FUP2 P4 P6 P8 P10 P12

dest

### · Initialized Rename Table



```
Instruction status:
                            Read Exec Write
                j k Issue Oper Comp Result
   Instruction
   LD
           F6
              34+ R2
   LD
             45+ R3
   MULTD
          F0
                   F4
               F2
   SUBD
          F8
                  F2
   DIVD
        F10
               F0
                  F6
   ADDD
          F6
               F8
                   F2
```

#### Functional unit status:

| et terret Sterres. |      |      | CICBI | ~ 1 | ~_ | 1 0 | 1 0 | <b>-</b> J • | 1 / . |
|--------------------|------|------|-------|-----|----|-----|-----|--------------|-------|
| Time Name          | Busy | Op   | Fi    | Fj  | Fk | Qj  | Qk  | Rj           | Rk    |
| Int1               | Yes  | Load | P32   |     | R2 |     |     |              | Yes   |
| Int2               | No   |      |       |     |    |     |     |              |       |
| Mult1              | No   |      |       |     |    |     |     |              |       |
| Add                | No   |      |       |     |    |     |     |              |       |
| Divide             | No   |      |       |     |    |     |     |              |       |

SI

S2

FII

FII

Fi?

Fk?

#### Register Rename and Result

| Clock |    | FO | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8 | F10 | <i>F12</i> | ••• | F30 |
|-------|----|----|------------|-----------|-----------|----|-----|------------|-----|-----|
| 1     | FU | P0 | P2         | P4        | P32       | P8 | P10 | P12        |     | P30 |

- · Each instruction allocates free register
- Similar to single-assignment compiler transformation 2/9/2011



| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instruction | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    |      |        |
| LD          | F2    | 45+  | R3 | 2     |      |      |        |
| MULTD       | F0    | F2   | F4 |       |      |      |        |
| SUBD        | F8    | F6   | F2 |       |      |      |        |
| DIVD        | F10   | F0   | F6 |       |      |      |        |
| ADDD        | F6    | F8   | F2 |       |      |      |        |

#### Functional unit status:

|           |      |      |     |    | ~ — |    |    | - <i>J</i> · |     |
|-----------|------|------|-----|----|-----|----|----|--------------|-----|
| Time Name | Busy | Op   | Fi  | Fj | Fk  | Qj | Qk | Rj           | Rk  |
| Int1      | Yes  | Load | P32 |    | R2  |    |    |              | Yes |
| Int2      | Yes  | Load | P34 |    | R3  |    |    |              | Yes |
| Mult1     | No   |      |     |    |     |    |    |              |     |
| Add       | No   |      |     |    |     |    |    |              |     |
| Divide    | No   |      |     |    |     |    |    |              |     |

*S1* 

FU FU Fi? Fk?

### Register Rename and Result

| Clock |    | F0 | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8 | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|----|------------|-----------|-----------|----|------------|------------|-----|-----|
| 2     | FU | P0 | P34        | P4        | P32       | P8 | P10        | P12        |     | P30 |



| Instructio  | n sta | tus:      |    |       | Read | Exec | Write  |
|-------------|-------|-----------|----|-------|------|------|--------|
| Instruction | on    | $\dot{J}$ | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+       | R2 | 1     | 2    | 3    |        |
| LD          | F2    | 45+       | R3 | 2     | 3    |      |        |
| MULTD       | F0    | F2        | F4 | 3     |      |      |        |
| SUBD        | F8    | F6        | F2 |       |      |      |        |
| DIVD        | F10   | F0        | F6 |       |      |      |        |
| ADDD        | F6    | F8        | F2 |       |      |      |        |

#### Functional unit status:

| Time Name | Busy | Op    | Fi  | Fj  | Fk         | Qj   | Qk | Rj | Rk  |
|-----------|------|-------|-----|-----|------------|------|----|----|-----|
| Int1      | Yes  | Load  | P32 |     | R2         |      |    |    | Yes |
| Int2      | Yes  | Load  | P34 |     | <b>R</b> 3 |      |    |    | Yes |
| Mult1     | Yes  | Multd | P36 | P34 | P4         | Int2 |    | No | Yes |
| Add       | No   |       |     |     |            |      |    |    |     |
| Divide    | No   |       |     |     |            |      |    |    |     |

S1 S2

FU FU Fj?

Fk?

### Register Rename and Result

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8 | <i>F10</i> | F12 | • • • | F30 |
|-------|----|-----|------------|-----------|-----------|----|------------|-----|-------|-----|
| 3     | FU | P36 | P34        | P4        | P32       | P8 | P10        | P12 |       | P30 |



| Instruction | n sta | tus: |                  |       | Read | Exec | Write  |
|-------------|-------|------|------------------|-------|------|------|--------|
| Instructio  | n     | j    | $\boldsymbol{k}$ | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2               | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3               | 2     | 3    | 4    |        |
| MULTD       | F0    | F2   | F4               | 3     |      |      |        |
| SUBD        | F8    | F6   | F2               | 4     |      |      |        |
| DIVD        | F10   | F0   | F6               |       |      |      |        |
| ADDD        | F6    | F8   | F2               |       |      |      |        |

#### Functional unit status:

|           |      |       |     | ~ - | ~ — |      |      | - <i>J</i> · |     |
|-----------|------|-------|-----|-----|-----|------|------|--------------|-----|
| Time Name | Busy | Op    | Fi  | Fj  | Fk  | Qj   | Qk   | Rj           | Rk  |
| Int1      | No   |       |     |     |     |      |      |              |     |
| Int2      | Yes  | Load  | P34 |     | R3  |      |      |              | Yes |
| Mult1     | Yes  | Multd | P36 | P34 | P4  | Int2 |      | No           | Yes |
| Add       | Yes  | Sub   | P38 | P32 | P34 |      | Int2 | Yes          | No  |
| Divide    | No   |       |     |     |     |      |      |              |     |

*S1* 

FU FU Fi? Fk?

### Register Rename and Result

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 4     | FU | P36 | P34        | P4        | P32       | P38 | P10        | P12        |     | P30 |



### **Renamed Scoreboard 5**

| Instructi | ion sta | tus:      |           |       | Read | Exec | Write  |
|-----------|---------|-----------|-----------|-------|------|------|--------|
| Instruc   | tion    | $\dot{j}$ | k         | Issue | Oper | Comp | Result |
| LD        | F6      | 34+       | R2        | 1     | 2    | 3    | 4      |
| LD        | F2      | 45+       | <b>R3</b> | 2     | 3    | 4    | 5      |
| MULT      | D F0    | F2        | F4        | 3     |      |      |        |
| SUBD      | F8      | F6        | F2        | 4     |      |      |        |
| DIVD      | F10     | F0        | F6        | 5     |      |      |        |
| ADDD      | F6      | F8        | F2        |       |      |      |        |

#### Functional unit status:

| l unit status: |      |       | dest | S1  | <i>S2</i> | FU    | FU | Fj? | Fk? |
|----------------|------|-------|------|-----|-----------|-------|----|-----|-----|
| Time Name      | Busy | Op    | Fi   | Fj  | Fk        | Qj    | Qk | Rj  | Rk  |
| Int1           | No   |       |      |     |           |       |    |     |     |
| Int2           | No   |       |      |     |           |       |    |     |     |
| Mult1          | Yes  | Multd | P36  | P34 | P4        |       |    | Yes | Yes |
| Add            | Yes  | Sub   | P38  | P32 | P34       |       |    | Yes | Yes |
| Divide         | Yes  | Divd  | P40  | P36 | P32       | Mult1 |    | No  | Yes |

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 5     | FU | P36 | P34        | P4        | P32       | P38 | P40        | P12        |     | P30 |



| In. | struction   | n sta | tus: |       |      | Read | Exec   | Write |
|-----|-------------|-------|------|-------|------|------|--------|-------|
|     | Instruction | j     | k    | Issue | Oper | Comp | Result |       |
|     | LD          | F6    | 34+  | R2    | 1    | 2    | 3      | 4     |
|     | LD          | F2    | 45+  | R3    | 2    | 3    | 4      | 5     |
|     | MULTD       | F0    | F2   | F4    | 3    | 6    |        |       |
|     | SUBD        | F8    | F6   | F2    | 4    | 6    |        |       |
|     | DIVD        | F10   | F0   | F6    | 5    |      |        |       |
|     | ADDD        | F6    | F8   | F2    |      |      |        |       |

#### Functional unit status:

|           |      |       |     | , <u> </u> | ·   |       |    | - <b>j</b> · |     |
|-----------|------|-------|-----|------------|-----|-------|----|--------------|-----|
| Time Name | Busy | Op    | Fi  | Fj         | Fk  | Qj    | Qk | Rj           | Rk  |
| Int1      | No   |       |     |            |     |       |    |              |     |
| Int2      | No   |       |     |            |     |       |    |              |     |
| 10 Mult1  | Yes  | Multd | P36 | P34        | P4  |       |    | Yes          | Yes |
| 2 Add     | Yes  | Sub   | P38 | P32        | P34 |       |    | Yes          | Yes |
| Divide    | Yes  | Divd  | P40 | P36        | P32 | Mult1 |    | No           | Yes |

dest S1 S2

FU FU Fi? Fk?

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 6     | FU | P36 | P34        | P4        | P32       | P38 | P40        | P12        |     | P30 |



| In | struction    | n sta | tus: |    |       | Read | Exec | Write  |
|----|--------------|-------|------|----|-------|------|------|--------|
|    | Instruction  | n     | j    | k  | Issue | Oper | Comp | Result |
|    | LD           | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
|    | LD           | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
|    | <b>MULTD</b> | F0    | F2   | F4 | 3     | 6    |      |        |
|    | SUBD         | F8    | F6   | F2 | 4     | 6    |      |        |
|    | DIVD         | F10   | F0   | F6 | 5     |      |      |        |
|    | ADDD         | F6    | F8   | F2 |       |      |      |        |

#### Functional unit status:

| Time Name | Busy | Op    | Fi  | Fj  | Fk  | Qj    | Qk | Řj  | Rk  |
|-----------|------|-------|-----|-----|-----|-------|----|-----|-----|
| Int1      | No   |       |     |     |     |       |    |     |     |
| Int2      | No   |       |     |     |     |       |    |     |     |
| 9 Mult1   | Yes  | Multd | P36 | P34 | P4  |       |    | Yes | Yes |
| 1 Add     | Yes  | Sub   | P38 | P32 | P34 |       |    | Yes | Yes |
| Divide    | Yes  | Divd  | P40 | P36 | P32 | Mult1 |    | No  | Yes |

S1 S2

FU FU Fj? Fk?

### Register Rename and Result

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 7     | FU | P36 | P34        | P4        | P32       | P38 | P40        | P12        |     | P30 |



| In | struction   | ı sta | tus: |    |       | Read | Exec | Write  |
|----|-------------|-------|------|----|-------|------|------|--------|
|    | Instruction | n     | j    | k  | Issue | Oper | Comp | Result |
|    | LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
|    | LD          | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
|    | MULTD       | F0    | F2   | F4 | 3     | 6    |      |        |
|    | SUBD        | F8    | F6   | F2 | 4     | 6    | 8    |        |
|    | DIVD        | F10   | F0   | F6 | 5     |      |      |        |
|    | ADDD        | F6    | F8   | F2 |       |      |      |        |

#### Functional unit status:

| l unit status: | •    |       | dest | <i>S1</i> | <i>S2</i> | FU    | FU | Fj? | Fk? |
|----------------|------|-------|------|-----------|-----------|-------|----|-----|-----|
| Time Name      | Busy | Op    | Fi   | Fj        | Fk        | Qj    | Qk | Rj  | Rk  |
| Int1           | No   |       |      |           |           |       |    |     |     |
| Int2           | No   |       |      |           |           |       |    |     |     |
| 8 Mult1        | Yes  | Multd | P36  | P34       | P4        |       |    | Yes | Yes |
| 0 Add          | Yes  | Sub   | P38  | P32       | P34       |       |    | Yes | Yes |
| Divide         | Yes  | Divd  | P40  | P36       | P32       | Mult1 |    | No  | Yes |

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 8     | FU | P36 | P34        | P4        | P32       | P38 | P40        | P12        |     | P30 |



| In | struction  | n sta | tus: |    |       | Read | Exec | Write  |
|----|------------|-------|------|----|-------|------|------|--------|
|    | Instructio | n     | j    | k  | Issue | Oper | Comp | Result |
|    | LD         | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
|    | LD         | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
|    | MULTD      | F0    | F2   | F4 | 3     | 6    |      |        |
|    | SUBD       | F8    | F6   | F2 | 4     | 6    | 8    | 9      |
|    | DIVD       | F10   | F0   | F6 | 5     |      |      |        |
|    | ADDD       | F6    | F8   | F2 |       |      |      |        |

#### Functional unit status:

| tille stelling. |      |       | Crest | ~ - | ~ - |       | - 0 | <i>- j</i> · |     |  |
|-----------------|------|-------|-------|-----|-----|-------|-----|--------------|-----|--|
| Time Name       | Busy | Op    | Fi    | Fj  | Fk  | Qj    | Qk  | Rj           | Rk  |  |
| Int1            | No   |       |       |     |     |       |     |              |     |  |
| Int2            | No   |       |       |     |     |       |     |              |     |  |
| 7 Mult1         | Yes  | Multd | P36   | P34 | P4  |       |     | Yes          | Yes |  |
| Add             | No   |       |       |     |     |       |     |              |     |  |
| Divide          | Yes  | Divd  | P40   | P36 | P32 | Mult1 |     | No           | Yes |  |

dest S1 S2

FU FU Fi? Fk?

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 9     | FU | P36 | P34        | P4        | P32       | P38 | P40        | P12        |     | P30 |



| In | struction   | ı sta | tus:      |    |       | Read | Exec | Write  |
|----|-------------|-------|-----------|----|-------|------|------|--------|
|    | Instruction | n     | $\dot{j}$ | k  | Issue | Oper | Comp | Result |
|    | LD          | F6    | 34+       | R2 | 1     | 2    | 3    | 4      |
|    | LD          | F2    | 45+       | R3 | 2     | 3    | 4    | 5      |
|    | MULTD       | F0    | F2        | F4 | 3     | 6    |      |        |
|    | SUBD        | F8    | F6        | F2 | 4     | 6    | 8    | 9      |
|    | DIVD        | F10   | F0        | F6 | 5     |      |      |        |
|    | ADDD        | F6    | F8        | F2 | 10    |      |      |        |

#### Functional unit status:

| Time Name | Busy | Op    | Fi  | Fj  | Fk         | Qj    | Qk   | Rj    | Rk  |
|-----------|------|-------|-----|-----|------------|-------|------|-------|-----|
| Int1      | No   |       |     |     |            |       |      |       |     |
| Int2      | No   |       |     |     | <b>√</b> W | 'AR H | zard | gone! |     |
| 6 Mult1   | Yes  | Multd | P36 | P34 | <b>P</b> 4 |       |      | Yes   | Yes |
| Add       | Yes  | Addd  | P42 | P38 | P 4        |       |      | Yes   | Yes |
| Divide    | Yes  | Divd  | P40 | P36 | P32        | Mult1 |      | No    | Yes |
|           |      |       |     |     |            |       |      |       |     |

SI

*S*2

FU

FU

Fi?

Fk?

#### Register Rename and Result

Clock F2F4*F*6 F8 *F30* F0F10 P34 P4 *FU* | P36 P42 P38 P12 P30 P40

- Notice that P32 not listed in Rename Table
   Still live. Must not be reallocated by accident CS252-S11, lecture?
- 105



| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
| MULTD       | F0    | F2   | F4 | 3     | 6    |      |        |
| SUBD        | F8    | F6   | F2 | 4     | 6    | 8    | 9      |
| DIVD        | F10   | F0   | F6 | 5     |      |      |        |
| ADDD        | F6    | F8   | F2 | 10    | 11   |      |        |

#### Functional unit status:

| Time Name | Busy | Op    | Fi  | Fj  | Fk  | Qj    | Qk | Řj  | Rk  |
|-----------|------|-------|-----|-----|-----|-------|----|-----|-----|
| Int1      | No   |       |     |     |     |       |    |     |     |
| Int2      | No   |       |     |     |     |       |    |     |     |
| 5 Mult1   | Yes  | Multd | P36 | P34 | P4  |       |    | Yes | Yes |
| 2 Add     | Yes  | Addd  | P42 | P38 | P34 |       |    | Yes | Yes |
| Divide    | Yes  | Divd  | P40 | P36 | P32 | Mult1 |    | No  | Yes |

dest S1 S2 FU FU Fj? Fk?

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 11    | FU | P36 | P34        | P4        | P42       | P38 | P40        | P12        |     | P30 |



| In | struction    | ı sta | tus: |    |       | Read | Exec | Write  |
|----|--------------|-------|------|----|-------|------|------|--------|
|    | Instruction  | n     | j    | k  | Issue | Oper | Comp | Result |
|    | LD           | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
|    | LD           | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
|    | <b>MULTD</b> | F0    | F2   | F4 | 3     | 6    |      |        |
|    | SUBD         | F8    | F6   | F2 | 4     | 6    | 8    | 9      |
|    | DIVD         | F10   | F0   | F6 | 5     |      |      |        |
|    | ADDD         | F6    | F8   | F2 | 10    | 11   |      |        |

#### Functional unit status:

|           |      |       |     | . – |     | _     | _  | $\boldsymbol{J}$ |     |
|-----------|------|-------|-----|-----|-----|-------|----|------------------|-----|
| Time Name | Busy | Op    | Fi  | Fj  | Fk  | Qj    | Qk | Rj               | Rk  |
| Int1      | No   |       |     |     |     |       |    |                  |     |
| Int2      | No   |       |     |     |     |       |    |                  |     |
| 4 Mult1   | Yes  | Multd | P36 | P34 | P4  |       |    | Yes              | Yes |
| 1 Add     | Yes  | Addd  | P42 | P38 | P34 |       |    | Yes              | Yes |
| Divide    | Yes  | Divd  | P40 | P36 | P32 | Mult1 |    | No               | Yes |

dest S1 S2

FU FU Fj? Fk?

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 12    | FU | P36 | P34        | P4        | P42       | P38 | P40        | P12        |     | P30 |



| Instruction | ı sta | tus: |    | Read  | Exec | Write |        |
|-------------|-------|------|----|-------|------|-------|--------|
| Instruction | n     | j    | k  | Issue | Oper | Comp  | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3     | 4      |
| LD          | F2    | 45+  | R3 | 2     | 3    | 4     | 5      |
| MULTD       | F0    | F2   | F4 | 3     | 6    |       |        |
| SUBD        | F8    | F6   | F2 | 4     | 6    | 8     | 9      |
| DIVD        | F10   | F0   | F6 | 5     |      |       |        |
| ADDD        | F6    | F8   | F2 | 10    | 11   | 13    |        |

#### Functional unit status:

|           |      |       |     | . – |     | _     | _  | J   |     |
|-----------|------|-------|-----|-----|-----|-------|----|-----|-----|
| Time Name | Busy | Op    | Fi  | Fj  | Fk  | Qj    | Qk | Rj  | Rk  |
| Int1      | No   |       |     |     |     |       |    |     |     |
| Int2      | No   |       |     |     |     |       |    |     |     |
| 3 Mult1   | Yes  | Multd | P36 | P34 | P4  |       |    | Yes | Yes |
| 0 Add     | Yes  | Addd  | P42 | P38 | P34 |       |    | Yes | Yes |
| Divide    | Yes  | Divd  | P40 | P36 | P32 | Mult1 |    | No  | Yes |

dest S1 S2

FU FU Fi? Fk?

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 13    | FU | P36 | P34        | P4        | P42       | P38 | P40        | P12        |     | P30 |



| In | struction   | n sta | tus: |           |       | Read | Exec | Write  |
|----|-------------|-------|------|-----------|-------|------|------|--------|
|    | Instruction | n     | j    | k         | Issue | Oper | Comp | Result |
|    | LD          | F6    | 34+  | R2        | 1     | 2    | 3    | 4      |
|    | LD          | F2    | 45+  | R3        | 2     | 3    | 4    | 5      |
|    | MULTD       | F0    | F2   | F4        | 3     | 6    |      |        |
|    | SUBD        | F8    | F6   | F2        | 4     | 6    | 8    | 9      |
|    | DIVD        | F10   | F0   | <b>F6</b> | 5     |      |      |        |
|    | ADDD        | F6    | F8   | F2        | 10    | 11   | 13   | 14     |

#### Functional unit status:

|           |      |       |     | ·   | ~ —        |       |    | - <i>J</i> · |     |
|-----------|------|-------|-----|-----|------------|-------|----|--------------|-----|
| Time Name | Busy | Op    | Fi  | Fj  | Fk         | Qj    | Qk | Rj           | Rk  |
| Int1      | No   |       |     |     |            |       |    |              |     |
| Int2      | No   |       |     |     |            |       |    |              |     |
| 2 Mult1   | Yes  | Multd | P36 | P34 | <b>P</b> 4 |       |    | Yes          | Yes |
| Add       | No   |       |     |     |            |       |    |              |     |
| Divide    | Yes  | Divd  | P40 | P36 | P32        | Mult1 |    | No           | Yes |

S1 S2

FU FU Fi? Fk?

#### Register Rename and Result

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 14    | FU | P36 | P34        | P4        | P42       | P38 | P40        | P12        |     | P30 |

dest



| In | struction    | n sta | tus: |    |       | Read | Exec | Write  |
|----|--------------|-------|------|----|-------|------|------|--------|
|    | Instruction  | n     | j    | k  | Issue | Oper | Comp | Result |
|    | LD           | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
|    | LD           | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
|    | <b>MULTD</b> | F0    | F2   | F4 | 3     | 6    |      |        |
|    | SUBD         | F8    | F6   | F2 | 4     | 6    | 8    | 9      |
|    | DIVD         | F10   | F0   | F6 | 5     |      |      |        |
|    | ADDD         | F6    | F8   | F2 | 10    | 11   | 13   | 14     |

#### Functional unit status:

| Time Name | Busy | Op    | Fi  | Fj  | Fk  | Qj    | Qk | Rj  | Rk  |
|-----------|------|-------|-----|-----|-----|-------|----|-----|-----|
| Int1      | No   |       |     |     |     |       |    |     |     |
| Int2      | No   |       |     |     |     |       |    |     |     |
| 1 Mult1   | Yes  | Multd | P36 | P34 | P4  |       |    | Yes | Yes |
| Add       | No   |       |     |     |     |       |    |     |     |
| Divide    | Yes  | Divd  | P40 | P36 | P32 | Mult1 |    | No  | Yes |

dest S1 S2

FU FU Fj? Fk?

#### Register Rename and Result

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 15    | FU | P36 | P34        | P4        | P42       | P38 | P40        | P12        |     | P30 |



| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
| MULTD       | F0    | F2   | F4 | 3     | 6    | 16   |        |
| SUBD        | F8    | F6   | F2 | 4     | 6    | 8    | 9      |
| DIVD        | F10   | F0   | F6 | 5     |      |      |        |
| ADDD        | F6    | F8   | F2 | 10    | 11   | 13   | 14     |

#### Functional unit status:

| Time Name | Busy | Op    | Fi  | Fj  | Fk  | Qj    | Qk | Rj  | Rk  |
|-----------|------|-------|-----|-----|-----|-------|----|-----|-----|
| Int1      | No   |       |     |     |     |       |    |     |     |
| Int2      | No   |       |     |     |     |       |    |     |     |
| 0 Mult1   | Yes  | Multd | P36 | P34 | P4  |       |    | Yes | Yes |
| Add       | No   |       |     |     |     |       |    |     |     |
| Divide    | Yes  | Divd  | P40 | P36 | P32 | Mult1 |    | No  | Yes |

dest S1 S2

FU FU Fj? Fk?

#### Register Rename and Result

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 16    | FU | P36 | P34        | P4        | P42       | P38 | P40        | P12        |     | P30 |



| In | struction   | ı sta | tus: |    |       | Read | Exec | Write  |
|----|-------------|-------|------|----|-------|------|------|--------|
|    | Instruction | n     | j    | k  | Issue | Oper | Comp | Result |
|    | LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
|    | LD          | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
|    | MULTD       | F0    | F2   | F4 | 3     | 6    | 16   | 17     |
|    | SUBD        | F8    | F6   | F2 | 4     | 6    | 8    | 9      |
|    | DIVD        | F10   | F0   | F6 | 5     |      |      |        |
|    | ADDD        | F6    | F8   | F2 | 10    | 11   | 13   | 14     |

#### Functional unit status:

| tille Stelles. | )    |      | acsi | $\mathcal{D}_{\mathbf{I}}$ | 52  | 10    | 1 0 | $\boldsymbol{I}$ $\boldsymbol{J}$ . | 1 10. |  |
|----------------|------|------|------|----------------------------|-----|-------|-----|-------------------------------------|-------|--|
| Time Name      | Busy | Op   | Fi   | Fj                         | Fk  | Qj    | Qk  | Rj                                  | Rk    |  |
| Int1           | No   |      |      |                            |     |       |     |                                     |       |  |
| Int2           | No   |      |      |                            |     |       |     |                                     |       |  |
| Mult1          | No   |      |      |                            |     |       |     |                                     |       |  |
| Add            | No   |      |      |                            |     |       |     |                                     |       |  |
| Divide         | Yes  | Divd | P40  | P36                        | P32 | Mult1 |     | Yes                                 | Yes   |  |

SI

FII FII

Fi?

Fk?

dest

### Register Rename and Result

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 17    | FU | P36 | P34        | P4        | P42       | P38 | P40        | P12        |     | P30 |



| In | struction   | ı sta | tus: |    |       | Read | Exec | Write  |
|----|-------------|-------|------|----|-------|------|------|--------|
|    | Instruction | n     | j    | k  | Issue | Oper | Comp | Result |
|    | LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
|    | LD          | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
|    | MULTD       | F0    | F2   | F4 | 3     | 6    | 16   | 17     |
|    | SUBD        | F8    | F6   | F2 | 4     | 6    | 8    | 9      |
|    | DIVD        | F10   | F0   | F6 | 5     | 18   |      |        |
|    | ADDD        | F6    | F8   | F2 | 10    | 11   | 13   | 14     |

#### Functional unit status:

|           |      |      | 0.00 | ~ = | ~ _ |       | - 0 | - j · |     |
|-----------|------|------|------|-----|-----|-------|-----|-------|-----|
| Time Name | Busy | Op   | Fi   | Fj  | Fk  | Qj    | Qk  | Rj    | Rk  |
| Int1      | No   |      |      |     |     |       |     |       |     |
| Int2      | No   |      |      |     |     |       |     |       |     |
| Mult1     | No   |      |      |     |     |       |     |       |     |
| Add       | No   |      |      |     |     |       |     |       |     |
| 40 Divide | Yes  | Divd | P40  | P36 | P32 | Mult1 |     | Yes   | Yes |

*S1* 

FU FU Fi?

Fk?

### Register Rename and Result

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 18    | FU | P36 | P34        | P4        | P42       | P38 | P40        | P12        |     | P30 |

dest



# **Explicit Renaming Support Includes:**

- Rapid access to a table of translations
- A physical register file that has more registers than specified by the ISA
- Ability to figure out which physical registers are free.
  - No free registers ⇒ stall on issue
- Thus, register renaming doesn't require reservation stations. However:
  - Many modern architectures use explicit register renaming +
     Tomasulo-like reservation stations to control execution.

# How many instructions can be in the pipeline?



Which features of an ISA limit the number of instructions in the pipeline?

Number of Registers

Which features of a program limit the number of instructions in the pipeline?

Control transfers

Out-of-order dispatch by itself does not provide any significant performance improvement!





| 1 | LD    | F2,  | 34(R2 | 2) | latency<br>1 | 1 2 |
|---|-------|------|-------|----|--------------|-----|
| 2 | LD    | F4,  | 45(R3 | 3) | long         |     |
| 3 | MULTD | F6,  | F4,   | F2 | 3            | 4 3 |
| 4 | SUBD  | F8,  | F2,   | F2 | 1            |     |
| 5 | DIVD  | F4,  | F2,   | F8 | 4            | 5   |
| 6 | ADDD  | F10, | F6,   | F4 | 1            | 6   |
|   |       |      |       |    |              |     |

| In-order:     | 1(2,1)                      | . <u>2</u> 3 4 <u>4</u> | <u>3</u> 5 | . <u>5</u> 6 <u>6</u> |
|---------------|-----------------------------|-------------------------|------------|-----------------------|
| Out-of-order: | 1 (2, <u>1</u> ) 4 <u>4</u> | . <u>2</u> 3            | <u>3</u> 5 | . <u>5</u> 6 <u>6</u> |

Out-of-order execution did not allow any significant improvement!



## Little's Law

## Throughput (T) = Number in Flight (N) / Latency (L)



## Example:

- 4 floating point registers
- 8 cycles per floating point operation
- ⇒ maximum of ½ issue per cycle without renaming!



## Instruction-level Parallelism via Renaming



In-order: 1(2,1)......234435....566

Out-of-order: 1(2,1) 4 4 5 . . . 2(3,5) 3 6 6

Any antidependence can be eliminated by renaming.  $(renaming \Rightarrow additional storage)$ 

Can be done either in Software or Hardware

2/9/2011 CS252-S11, lecture 7 118



## **Data-Flow Architectures**

 Basic Idea: Hardware respresents direct encoding of compiler dataflow graphs:

- Data flows along arcs in "Tokens".
- When two tokens arrive at compute box, box "fires" and produces new token.
- Split operations produce copies of tokens





# Paper by Dennis and Misunas



# What about Precise Exceptions/Interrupts?



- Both Scoreboard and Tomasulo have:
  - In-order issue, out-of-order execution, out-of-order completion
- Recall: An interrupt or exception is precise if there is a single instruction for which:
  - All instructions before that have committed their state
  - No following instructions (including the interrupting instruction) have modified any state.
- Need way to resynchronize execution with instruction stream (I.e. with issue-order)
  - Easiest way is with in-order completion (i.e. reorder buffer)
  - Other Techniques (Smith paper): Future File, History Buffer

# **Exception Handling**

(In-Order Five-Stage Pipeline)



Commit

- Hold exception flags in pipeline until commit point (M stage)
- Exceptions in earlier pipe stages override later exceptions
- Inject external interrupts at commit point (override others)
- If exception at commit: update Cause and EPC registers, kill all stages, inject handler PC into fetch stage

2/9/2011 CS252-S11, lecture 7



# **Complex In-Order Pipeline: Precise Exceptions**





# **In-Order Commit for Precise Exceptions**



- Instructions fetched and decoded into instruction reorder buffer in-order
- Execution is out-of-order ( ⇒ out-of-order completion)
- Commit (write-back to architectural state, i.e., regfile & memory) is in-order

Temporary storage needed to hold results before commit (shadow registers and store buffers)

## In-Order versus Out-of-Order Phases

- Instruction fetch/decode/rename always in-order
  - Need to parse ISA sequentially to get correct semantics
  - CS252:Proposals for speculative OoO instruction fetch, e.g., Multiscalar. Predict control flow and data dependencies across sequential program segments fetched/decoded/executed in parallel, fixup if prediction wrong
- Dispatch (place instruction into machine buffers to wait for issue) also always in-order
  - Some use "Dispatch" to mean "Issue", but not in these lectures

## **In-Order Versus Out-of-Order Issue**

# In-order (InO) issue:

- Issue stalls on RAW dependencies or structural hazards, or possibly WAR/WAW hazards
- Instruction cannot issue to execution units unless all preceding instructions have issued to execution units

# Out-of-order (OoO) issue:

- Instructions dispatched in program order to reservation stations (or other forms of instruction buffer) to wait for operands to arrive, or other hazards to clear
- While earlier instructions wait in issue buffers, following instructions can be dispatched and issued out-of-order

# **In-Order versus Out-of-Order Completion**

- All but simplest machines have out-of-order completion, due to different latencies of functional units and desire to bypass values as soon as available
- Classic RISC 5-stage integer pipeline just barely has in-order completion
  - Load takes two cycles, but following one-cycle integer op completes at same time, not earlier
  - Adding pipelined FPU immediately brings OoO completion

## **In-Order versus Out-of-Order Commit**

- In-order commit supports precise traps, standard today
  - CS252: Some proposals to reduce the cost of in-order commit by retiring some instructions early to compact reorder buffer, but this is just an optimized in-order commit
- Out-of-order commit was effectively what early OoO machines implemented (imprecise traps) as completion irrevocably changed machine state
  - i.e., complete == commit in these machines

# **OoO Design Choices**

- Where are reservation stations?
  - Part of reorder buffer, or in separate issue window?
  - Distributed by functional units, or centralized?
- How is register renaming performed?
  - Tags and data held in reservation stations, with separate architectural register file
  - Tags only in reservation stations, data held in unified physical register file

# What are the hardware complexities with reorder buffer (ROB)?





- How do you find the latest version of a register?
  - As specified by Smith paper, need associative comparison network
  - Could use future file or just use the register result status buffer to track which specific reorder buffer has received the value
- Need as many ports on ROB as register file



# Four Steps of Speculative Tomasulo

## 1.Issue—get instruction from FP Op Queue

If reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination (this stage sometimes called "dispatch")

## 2. Execution—operate on operands (EX)

When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called "issue")

## 3. Write result—finish execution (WB)

Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available.

## 4. Commit—update register with reorder result

When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called "graduation")





































- Question: Given a load that follows a store in program order, are the two related?
  - (Alternatively: is there a RAW hazard between the store and the load)?

Eg: st 0(R2),R5 ld R6,0(R3)

- Can we go ahead and start the load early?
  - Store address could be delayed for a long time by some calculation that leads to R2 (divide?).
  - We might want to issue/begin execution of both operations in same cycle.
  - Today: Answer is that we are not allowed to start load until we know that address 0(R2) ≠ 0(R3)
  - Next Week: We might guess at whether or not they are dependent (called "dependence speculation") and use reorder buffer to fixup if we are wrong.

# Hardware Support for Memory Disambiguation



- Need buffer to keep track of all outstanding stores to memory, in program order.
  - Keep track of address (when becomes available) and value (when becomes available)
  - FIFO ordering: will retire stores from this buffer in program order
- When issuing a load, record current head of store queue (know which stores are ahead of you).
- When have address for load, check store queue:
  - If any store prior to load is waiting for its address, stall load.
  - If load address matches earlier store address (associative lookup), then we have a memory-induced RAW hazard:
    - » store value available ⇒ return value
    - » store value not available ⇒ return ROB number of source
  - Otherwise, send out request to memory
- Actual stores commit in order, so no worry about WAR/WAW hazards through memory.





# "Data-in-ROB" Design

## (HP PA8000, Pentium Pro, Core2Duo, Nehalem)

| Oldest            | V           | i | Opcode | р | Tag | Src1 | р | Tag | Src2 | р | Reg | Result | Except? |
|-------------------|-------------|---|--------|---|-----|------|---|-----|------|---|-----|--------|---------|
| $\longrightarrow$ | <b>&gt;</b> | i | Opcode | р | Tag | Src1 | р | Tag | Src2 | р | Reg | Result | Except? |
| Eroo              | <b>&gt;</b> | i | Opcode | р | Tag | Src1 | р | Tag | Src2 | р | Reg | Result | Except? |
| Free              | <b>&gt;</b> | i | Opcode | р | Tag | Src1 | р | Tag | Src2 | р | Reg | Result | Except? |
|                   | V           | i | Opcode | р | Tag | Src1 | р | Tag | Src2 | р | Reg | Result | Except? |

- Managed as circular buffer in program order, new instructions dispatched to free slots, oldest instruction committed/reclaimed when done ("p" bit set on result)
- Tag is given by index in ROB (Free pointer value)
- In dispatch, non-busy source operands read from architectural register file and copied to Src1 and Src2 with presence bit "p" set. Busy operands copy tag of producer and clear "p" bit.
- Set valid bit "v" on dispatch, set issued bit "i" on issue
- On completion, search source tags, set "p" bit and copy data into src on tag match. Write result and exception flags to ROB.
- On commit, check exception status, and copy result into architectural register file if no trap.
- On trap, flush machine and ROB, set free=oldest, jump to handler

# **Managing Rename for Data-in-ROB**

Rename table associated with architectural registers, managed in decode/dispatch

|   | I_  |       | _ | One      |
|---|-----|-------|---|----------|
| р | Tag | Value |   | One      |
| р | Tag | Value |   | entry    |
| р | Tag | Value |   | per      |
|   |     |       |   | arch.    |
| р | Tag | Value |   |          |
|   |     |       |   | register |

- If "p" bit set, then use value in architectural register file
- Else, tag field indicates instruction that will/has produced value
- For dispatch, read source operands <p,tag,value> from arch. regfile, then also read <p,result> from producing instruction in ROB at tag index, bypassing as needed. Copy operands to ROB.
- Write destination arch. register entry with <0,Free,\_>, to assign tag to ROB index of this instruction
- On commit, update arch. regfile with <1,\_,Result> if tag matches, otherwise update with <0,\_,Result>. (Tag value is not updated)
- On trap, reset table (All p=1)

#### **Data Movement in Data-in-ROB Design**



## **Unified Physical Register File**

(MIPS R10K, Alpha 21264, Intel Pentium 4 & Sandy/Ivy Bridge)

- Rename all architectural registers into a single physical register file during decode, no register values read
- Functional units read and write from single unified register file holding committed and temporary registers in execute
- Commit only updates mapping of architectural register to physical register, no data movement



## **Lifetime of Physical Registers**

- Physical regfile holds committed and speculative values
- Physical registers decoupled from ROB entries (no data in ROB)

```
1d \times 1, (x3)
                                     1d P1, (Px)
addi x3, x1, #4
                                     addi P2, P1, #4
sub x6, x7, x9
                                     sub P3, Py, Pz
add x3, x3, x6
                                     add P4, P2, P3
                      Rename
ld x6, (x1)
                                     ld P5, (P1)
add x6, x6, x3
                                     add P6, P5, P4
sd x6, (x1)
                                     sd P6, (P1)
1d \times 6, (x11)
                                     1d P7, (Pw)
```

When can we reuse a physical register?

When next writer of same architectural register commits



| Physical Regs |           |   |  |  |  |
|---------------|-----------|---|--|--|--|
| P0            |           |   |  |  |  |
| P1            |           |   |  |  |  |
| P2<br>P3      |           | + |  |  |  |
| P4            |           | + |  |  |  |
| P5            | <x6></x6> | р |  |  |  |
| P6            | <x7></x7> | р |  |  |  |
| P7            | <x3></x3> | р |  |  |  |
| P8            | <x1></x1> | р |  |  |  |
|               |           |   |  |  |  |
| Pn            |           |   |  |  |  |

| P0 P1 P3 |
|----------|
|----------|

Free List

**ROB** 

| use | ex | ор | p1 | PR1 | p2 | PR2 | Rd | LPRd | PRd |
|-----|----|----|----|-----|----|-----|----|------|-----|
|     |    |    |    |     |    |     |    |      |     |
|     |    |    |    |     |    |     |    |      |     |
|     |    |    |    |     |    |     |    |      |     |
|     |    |    |    |     |    |     |    |      |     |
|     |    |    |    |     |    |     |    |      |     |
|     |    |    |    |     |    |     |    |      |     |
|     |    |    |    |     |    |     |    |      |     |

(LPRd requires third read port on Rename Table for each instruction)















# Relationship between precise interrupts and speculation:



- Speculation is a form of guessing
  - Branch prediction, data prediction
  - If we speculate and are wrong, need to back up and restart execution to point at which we predicted incorrectly
  - This is exactly same as precise exceptions!
- Branch prediction is a very important!
  - Need to "take our best shot" at predicting branch direction.
  - If we issue multiple instructions per cycle, lose lots of potential instructions otherwise:
    - » Consider 4 instructions per cycle
    - » If take single cycle to decide on branch, waste from 4 7 instruction slots!
- Technique for both precise interrupts/exceptions and speculation: in-order completion or commit
  - This is why reorder buffers in all new processors

# Quick Recap: Explicit Register Renaming



- Make use of a physical register file that is larger than number of registers specified by ISA
- Keep a translation table:
  - ISA register => physical register mapping
  - When register is written, replace table entry with new register from freelist.
  - Physical register becomes free when not being used by any instructions in progress.







- Physical register file larger than ISA register file
- On issue, each instruction that modifies a register is allocated new physical register from freelist
- Used on: R10000, Alpha 21264, HP PA8000





- Note that physical register P0 is "dead" (or not "live") past the point of this load.
  - When we go to commit the load, we free up











**R10000 Freelist Management** 





**162** 







## **Advantages of Explicit Renaming**

- Decouples renaming from scheduling:
  - Pipeline can be exactly like "standard" DLX pipeline (perhaps with multiple operations issued per cycle)
  - Or, pipeline could be tomasulo-like or a scoreboard, etc.
  - Standard forwarding or bypassing could be used
- Allows data to be fetched from single register file
  - No need to bypass values from reorder buffer
  - This can be important for balancing pipeline
- Many processors use a variant of this technique:
  - R10000, Alpha 21264, HP PA8000
- Another way to get precise interrupt points:
  - All that needs to be "undone" for precise break point is to undo the table mappings
  - Provides an interesting mix between reorder buffer and future file
    - » Results are written immediately back to register file
    - » Registers names are "freed" in program order (by ROB)

#### **Repairing Rename at Traps**

- MIPS R10K rename table is repaired by unrenaming instructions in reverse order using the PRd/LPRd fields
- Alpha 21264 had similar physical register file scheme, but kept complete rename table snapshots for each instruction in ROB (80 snapshots total)
  - Flash copy all bits from snapshot to active table in one cycle

# Reorder Buffer Holds Active Instructions (Decoded but not Committed)



Cycle *t* 

**Cycle** *t* + *1* 

#### **Separate Issue Window from ROB**

The issue window holds only instructions that have been decoded and renamed but not issued into execution. Has register tags and presence bits, and pointer to ROB entry.

| use | ex | ор | p1 | PR1 | p2 | PR2 | PRd | ROB# |
|-----|----|----|----|-----|----|-----|-----|------|
|     |    |    |    |     |    |     |     |      |
|     |    |    |    |     |    |     |     |      |
|     |    |    |    |     |    |     |     |      |
|     |    |    |    |     |    |     |     |      |

ROB is usually several times larger than issue window – why?

## **Superscalar Register Renaming**

- During decode, instructions allocated new physical destination register
- Source operands renamed to physical register with newest value
- Execution unit only sees physical register numbers



Does this work?

## **Superscalar Register Renaming**



MIPS R10K renames 4 serially-RAW-dependent insts/cycle



## Superscalar Register Renaming

- During decode, instructions allocated new physical destination register
- Source operands renamed to physical register with newest value
- Execution unit only sees physical register numbers



Does this work?



# **Superscalar Register Renaming (Try #2)**



MIPS R10K renames 4 serially-RAW-dependent insts/cycle



## **Summary**

- DataFlow view:
  - Data triggers execution rather than instructions triggering data
- Dynamic hardware schemes can unroll loops dynamically in hardware
  - Form of limited dataflow
  - Register renaming is essential
- Explicit Renaming: more physical registers than needed by ISA.
  - Rename table: tracks current association between architectural registers and physical registers
  - Uses a translation table to perform compiler-like transformation on the fly
- Precise Interrupts:
  - Must commit things back in order
  - Reorder buffer: temporarily holds results until commit possible
  - Toss out things to achieve precise interrupt point

#### **Acknowledgements**

- This course is partly inspired by previous MIT 6.823 and Berkeley CS252 computer architecture courses created by my collaborators and colleagues:
  - Arvind (MIT)
  - Joel Emer (Intel/MIT)
  - James Hoe (CMU)
  - John Kubiatowicz (UCB)
  - David Patterson (UCB)