# CS252 Graduate Computer Architecture Lecture 7

Scoreboard, Tomasulo, Register Renaming February 8<sup>th</sup>, 2012

#### John Kubiatowicz

Electrical Engineering and Computer Sciences
University of California, Berkeley

http://www.eecs.berkeley.edu/~kubitron/cs252

#### **Recall: Revised FP Loop Minimizing Stalls**

```
1 Loop: LD F0,0(R1)
2    stall
3    ADDD F4,F0,F2
4    SUBI R1,R1,8
5    BNEZ R1,Loop ;delayed branch
6    SD 8(R1),F4 ;altered when move past SUBI
```

#### Swap BNEZ and SD by changing address of SD

| Instruction producing result | Instruction<br>using result | La<br>cloc | tency in<br>ck cycles |
|------------------------------|-----------------------------|------------|-----------------------|
| FP ALU op                    | Another FP ALU              | ор         | 3                     |
| FP ALU op                    | Store double                |            | 2                     |
| Load double                  | FP ALU op                   |            | 1                     |

6 clocks: Unroll loop 4 times code to make faster?

2/08/2012 cs252-S12, Lecture07 2

#### **Recall: Software Pipelining Example**

vs. once per each unrolled iteration in loop unrolling



#### 5 cycles per iteration

#### **Trace Scheduling in VLIW**

- Problem: need large blocks of instructions w/o branches
  - Only way to be able to find groups of unrelated instructions
  - Dynamic branch prediction not an option
- Parallelism across IF branches vs. LOOP branches
- Two steps:
  - Trace Selection
    - » Find likely sequence of basic blocks (trace) of (statically predicted or profile predicted) long sequence of straight-line code
  - Trace Compaction
    - » Squeeze trace into few VLIW instructions
    - » Need bookkeeping code in case prediction is wrong
- This is a form of compiler-generated speculation
  - Compiler must generate "fixup" code to handle cases in which trace is not the taken branch
  - Needs extra registers: undoes bad guess by discarding
- Subtle compiler bugs mean wrong answer vs. poorer performance; no hardware interlocks

2/08/2012 cs252-S12, Lecture07 3 2/08/2012 cs252-S12, Lecture07

#### When Safe to Unroll Loop?

 Example: Where are data dependencies? (A,B,C distinct & nonoverlapping)

```
for (i=0; i<100; i=i+1) {
    A[i+1] = A[i] + C[i];     /* S1 */
    B[i+1] = B[i] + A[i+1];    /* S2 */
}</pre>
```

- 1. S2 uses the value, A[i+1], computed by S1 in the same iteration.
- 2. S1 uses a value computed by S1 in an earlier iteration, since iteration i computes A[i+1] which is read in iteration i+1. The same is true of S2 for B[i] and B[i+1].

This is a "loop-carried dependence": between iterations

- For our prior example, each iteration was distinct
  - In this case, iterations can't be executed in parallel, Right????

2/08/2012 cs252-S12, Lecture07 5

#### Can we use HW to get CPI closer to 1?

- Why in HW at run time?
  - Works when can't know real dependence at compile time
  - Compiler simpler
  - Code for one machine runs well on another
- Key idea: Allow instructions behind stall to proceed

```
DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F12,F8,F14
```

Out-of-order execution => out-of-order completion.

# Does a loop-carried dependence mean there is no parallelism???

· Consider:

2/08/2012

· Relies on associative nature of "+".

cs252-S12, Lecture07

#### **Problems?**

- How do we prevent WAR and WAW hazards?
- How do we deal with variable latency?
  - Forwarding for RAW hazards harder.

|      |            |    | Gook Cycle Number |    |     |               |            |       |            |       |       |            |       |       |       |       |     |    |
|------|------------|----|-------------------|----|-----|---------------|------------|-------|------------|-------|-------|------------|-------|-------|-------|-------|-----|----|
| In   | struction  | 1  | 2                 | 3  | 4   | 5             | 6          | 7     | 8          | 9     | 10    | 11         | 12    | 13    | 14    | 15    | 16  | 17 |
| Ю    | F6,34(R2)  | IF | ID                | ΕX | MEM | WB            |            |       |            |       |       |            |       |       |       |       |     |    |
| LD   | F2,45(R3)  |    | IF                | ID | ΕX  | MEM           | WB         |       |            |       |       |            |       |       |       | RA    | W   |    |
| MUTE | ) F0,F2,F4 |    |                   | IF | ID  | <i>s</i> tall | M          | M2    | W3         | M4    | МБ    | <b>M</b> 6 | W     | M8    | M9    | MO    | MEM | WB |
| SUBD | F8,F6,F2   |    |                   |    | IF  | ID            | <i>A</i> 1 | A2    | MEM        | WB    |       |            |       |       |       | 1     |     |    |
| DIVD | F10,F0,F6  |    |                   |    |     | IF            | ID         | stall | stall      | stall | stall | stall      | stall | stall | stall | stall | D1  | D2 |
| ADDD | F6,F8,F2   |    |                   |    |     |               | IF         | ID    | <i>A</i> 1 | A2    | MEM   | WB         | +     |       | W     | 4R    |     |    |

How to get precise exceptions?

2/08/2012 cs252-S12, Lecture07 7 2/08/2012 cs252-S12, Lecture07

#### Scoreboard: a bookkeeping technique

- Out-of-order execution divides ID stage:
  - 1. Issue—decode instructions, check for structural hazards
  - 2. Read operands—wait until no data hazards, then read operands
- Scoreboards date to CDC6600 in 1963
  - Readings for Monday include one on CDC6600
- Instructions execute whenever not dependent on previous instructions and no hazards.
- CDC 6600: In order issue, out-of-order execution, outof-order commit (or completion)
  - No forwarding!
  - Imprecise interrupt/exception model for now

2/08/2012 cs252-S12, Lecture07

## **Scoreboard Implications**

- Out-of-order completion => WAR, WAW hazards?
- Solutions for WAR:
  - Stall writeback until registers have been read
  - Read registers only during Read Operands stage
- Solution for WAW:
  - Detect hazard and stall issue of new instruction until other instruction completes
- No register renaming
- Need to have multiple instructions in execution phase => multiple execution units or pipelined execution units
- Scoreboard keeps track of dependencies between instructions that have already issued
- · Scoreboard replaces ID, EX, WB with 4 stages

#### **Scoreboard Architecture (CDC 6600)**



## Four Stages of Scoreboard Control

- Issue—decode instructions & check for structural hazards (ID1)
  - Instructions issued in program order (for hazard checking)
  - Don't issue if structural hazard
  - Don't issue if instruction is output dependent on any previously issued but uncompleted instruction (no WAW hazards)
- Read operands—wait until no data hazards, then read operands (ID2)
  - All real dependencies (RAW hazards) resolved in this stage, since we wait for instructions to write back data.
  - No forwarding of data in this model!

2/08/2012 cs252-S12, Lecture07 11 2/08/2012 cs252-S12, Lecture07 12

#### **Four Stages of Scoreboard Control**

- Execution—operate on operands (EX)
  - The functional unit begins execution upon receiving operands.
     When the result is ready, it notifies the scoreboard that it has completed execution.
- Write result—finish execution (WB)
  - Stall until no WAR hazards with previous instructions:

Example: DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F8,F8,F14

CDC 6600 scoreboard would stall SUBD until ADDD reads operands

2/08/2012 cs252-S12, Lecture07

#### Three Parts of the Scoreboard

Instruction status:
 Which of 4 steps the instruction is in

 Functional unit status:—Indicates the state of the functional unit (FU). 9 fields for each functional unit

Op: Operation to perform in the unit (e.g., + or -)

Fi: Destination register
Fi,Fk: Source-register numbers

Qj,Qk: Functional units producing source registers Fj, Fk

Rj,Rk: Flags indicating when Fj, Fk are ready

 Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register

cs252-S12, Lecture07

14

## **Scoreboard Example**

| Instructio | n sta | tus: |    |       | Read | Exec | Write  |
|------------|-------|------|----|-------|------|------|--------|
| Instructio | n     | j    | k  | Issue | Oper | Comp | Result |
| LD         | F6    | 34+  | R2 |       |      |      |        |
| LD         | F2    | 45+  | R3 |       |      |      |        |
| MULTD      | F0    | F2   | F4 |       |      |      |        |
| SUBD       | F8    | F6   | F2 |       |      |      |        |
| DIVD       | F10   | F0   | F6 |       |      |      |        |
| ADDD       | F6    | F8   | F2 |       |      |      |        |

## Functional unit status: dest S1 S2 FU FU Fj? Fk? Time Name Busy Op Fi Fj Fk Oj Ok Rj Rk

| ime Name | Busy | Op | Fi | Fj | Fk | Qj | Qk | Rj | Rk |
|----------|------|----|----|----|----|----|----|----|----|
| Integer  | No   |    |    |    |    |    |    |    |    |
| Mult1    | No   |    |    |    |    |    |    |    |    |
| Mult2    | No   |    |    |    |    |    |    |    |    |
| Add      | No   |    |    |    |    |    |    |    |    |
| Divide   | No   |    |    |    |    |    |    |    |    |

#### Register result status:

Clock

F0 F2 F4 F6 F8 F10 F12 ... F30

## **Detailed Scoreboard Pipeline Control**

| Instruction status | Wait until                                                             | Bookkeeping                                                                                                                                                                                                            |
|--------------------|------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Issue              | Not busy (FU)<br>and not result(D)                                     | Busy(FU)← yes; Op(FU)← op;<br>Fi(FU)← `D'; Fj(FU)← `S1';<br>Fk(FU)← `S2'; Qj← Result('S1');<br>Qk← Result(`S2'); Rj← not Qj;<br>Rk← not Qk; Result('D')← FU;                                                           |
| Read operands      | Rj and Rk                                                              | Rj← No; Rk← No                                                                                                                                                                                                         |
| Execution complete | Functional unit done                                                   |                                                                                                                                                                                                                        |
| Write<br>result    | ∀f((Fj(f)≠Fi(FU)<br>or Rj(f)=No) &<br>(Fk(f)≠Fi(FU) or<br>Rk( f )=No)) | $\forall f (if Qj(f)=FU \text{ then } Rj(f) \leftarrow \text{Yes}); \\ \forall f (if Qk(f)=FU \text{ then } Rj(f) \leftarrow \text{Yes}); \\ \text{Result}(Fi(FU)) \leftarrow 0; \text{Busy}(FU) \leftarrow \text{No}$ |

2/08/2012 cs252-S12, Lecture07 15 2/08/2012 cs252-S12, Lecture07 16

13

2/08/2012



No





2/08/2012 cs252-S12, Lecture07 17

#### **Scoreboard Example: Cycle 2**



| Functional unit status. | •    |      | dest | S1 | S2 | FU | FU | Fj? | Fk? |
|-------------------------|------|------|------|----|----|----|----|-----|-----|
| Time Name               | Busy | Op   | Fi   | Fj | Fk | Qj | Qk | Rj  | Rk  |
| Integer                 | Yes  | Load | F6   |    | R2 |    |    |     | Yes |
| Mult1                   | No   |      |      |    |    |    |    |     |     |
| Mult2                   | No   |      |      |    |    |    |    |     |     |
| Add                     | No   |      |      |    |    |    |    |     |     |
| Divide                  | No   |      |      |    |    |    |    |     |     |

Register result status: Clock

Issue 2nd LD?

2/08/2012 cs252-S12, Lecture07 18

## **Scoreboard Example: Cycle 3**





#### Register result status:

| Clock |    | F0 | F2 | F4 | F6      | F8 | F10 | F12 | <br>F30 |
|-------|----|----|----|----|---------|----|-----|-----|---------|
| 3     | FU |    |    |    | Integer |    |     |     |         |

#### Issue MULT?





#### Register result status:



2/08/2012 cs252-S12. Lecture07 19 2/08/2012 cs252-S12. Lecture07



# Functional unit status: dest S1 S2 FU FU Fj? Fk? Time Name Integer Mult1 Mult2 No Add No

#### Register result status:

Divide

No

| Clock |    | F0 | F2      | F4 | F6 | F8 | F10 | F12 | <br>F30 |
|-------|----|----|---------|----|----|----|-----|-----|---------|
| 5     | FU |    | Integer |    |    |    |     |     |         |

2/08/2012 cs252-S12, Lecture07 21

## Scoreboard Example: Cycle 6





#### Register result status:

| Clock | F0       | F2      | F4 | F6 | F8 | F10 | F12 | <br>F30 |
|-------|----------|---------|----|----|----|-----|-----|---------|
| 6     | FU Mult1 | Integer |    |    |    |     |     |         |

2/08/2012 cs252-S12, Lecture07 22

## **Scoreboard Example: Cycle 7**





#### Register result status:

| Clock | FC     | F2        | F4 | F6 | F8  | F10 | F12 | <br>F30 |
|-------|--------|-----------|----|----|-----|-----|-----|---------|
| 7     | FU Mul | t1 Intege | r  |    | Add |     |     |         |

#### · Read multiply operands?

|                             | Cycle | - |
|-----------------------------|-------|---|
| (First half of clock cycle) |       |   |

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    |        |
| MULTD       | F0    | F2   | F4 | 6     |      |      |        |
| SUBD        | F8    | F6   | F2 | 7     |      |      |        |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 |       |      |      |        |

| Functional unit status. | •    |      | dest | S1 | <i>S</i> 2 | FU      | FU      | Fj? | Fk? |
|-------------------------|------|------|------|----|------------|---------|---------|-----|-----|
| Time Name               | Busy | Op   | Fi   | Fj | Fk         | Qj      | Qk      | Rj  | Rk  |
| Integer                 | Yes  | Load | F2   |    | R3         |         |         |     | No  |
| Mult1                   | Yes  | Mult | F0   | F2 | F4         | Integer |         | No  | Yes |
| Mult2                   | No   |      |      |    |            |         |         |     |     |
| Add                     | Yes  | Sub  | F8   | F6 | F2         | ]       | Integer | Yes | No  |
| Divide                  | Yes  | Div  | F10  | F0 | F6         | Mult1   |         | No  | Yes |

#### Register result status:

| Clock |    | F0    | F2      | F4 | F6 | F8  | F10    | F12 | <br>F30 |
|-------|----|-------|---------|----|----|-----|--------|-----|---------|
| 8     | FU | Mult1 | Integer |    |    | Add | Divide |     |         |

2/08/2012 cs252-S12, Lecture07 23 2/08/2012 cs252-S12, Lecture07 24

## Scoreboard Example: Cycle 8b (Second half of clock cycle)

#### 

#### Functional unit status: dest S1 S2FU FU Fj? Fk? Busy Op Time Name Integer Mult1 Yes Mult F0 Mult2 Add Yes Sub F8 F6 F2 Yes Yes Divide Div F10 F6 Mult1 Yes

#### Register result status:

| Clock | F0       | F2 | F4 | F6 | F8  | F10    | F12 | ••• | F30 |
|-------|----------|----|----|----|-----|--------|-----|-----|-----|
| 8     | FU Mult1 |    |    |    | Add | Divide |     |     |     |

2/08/2012 cs252-S12, Lecture07 25

## **Scoreboard Example: Cycle 9**

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    |      |        |
| SUBD        | F8    | F6   | F2 | 7     | 9    |      |        |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 |       |      |      |        |
|             |       |      |    |       |      |      |        |

| Functiona | l unit status | :    |      | dest | SI | S2 | FU    | FU | Fj? | Fk? |
|-----------|---------------|------|------|------|----|----|-------|----|-----|-----|
|           | Time Name     | Busy | Op   | Fi   | Fj | Fk | Qj    | Qk | Rj  | Rk  |
|           | Integer       |      |      |      |    |    |       |    |     |     |
| Note —    |               |      | Mult | F0   | F2 | F4 |       |    | Yes | Yes |
| Remaining |               |      |      |      |    |    |       |    |     |     |
| _         | 2 Add         | Yes  | Sub  | F8   | F6 | F2 |       |    | Yes | Yes |
|           | Divide        | Yes  | Div  | F10  | F0 | F6 | Mult1 |    | No  | Yes |

#### Register result status:

| Clock |    | F0    | F2 | F4 | F6 | F8  | F10    | F12 | <br>F30 |
|-------|----|-------|----|----|----|-----|--------|-----|---------|
| 9     | FU | Mult1 |    |    |    | Add | Divide |     |         |

Read operands for MULT & SUB? Issue ADDD?

2/08/2012 cs252-S12, Lecture07

## **Scoreboard Example: Cycle 10**

| Instructio | on sta | tus: |    |       | Read | Exec | Write  |
|------------|--------|------|----|-------|------|------|--------|
| Instruct   | ion    | j    | k  | Issue | Oper | Comp | Result |
| LD         | F6     | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD         | F2     | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTE      | F0     | F2   | F4 | 6     | 9    |      |        |
| SUBD       | F8     | F6   | F2 | 7     | 9    |      |        |
| DIVD       | F10    | F0   | F6 | 8     |      |      |        |
| ADDD       | F6     | F8   | F2 |       |      |      |        |

| Functional unit status. | •    |      | dest | S1 | <i>S</i> 2 | FU    | FU | Fj? | Fk? |
|-------------------------|------|------|------|----|------------|-------|----|-----|-----|
| Time Name               | Busy | Op   | Fi   | Fj | Fk         | Qj    | Qk | Rj  | Rk  |
| Integer                 | No   |      |      |    |            |       |    |     |     |
| 9 Mult1                 | Yes  | Mult | F0   | F2 | F4         |       |    | No  | No  |
| Mult2                   | No   |      |      |    |            |       |    |     |     |
| 1 Add                   | Yes  | Sub  | F8   | F6 | F2         |       |    | No  | No  |
| Divide                  | Ves  | Div  | F10  | FO | F6         | Mult1 |    | No  | Ves |

#### Register result status:

| Clock |          | F2 | F4 | F6 | F8  | F10 I  | 712 | <br>F30 |
|-------|----------|----|----|----|-----|--------|-----|---------|
| 10    | FU Mult1 |    |    |    | Add | Divide |     |         |

## **Scoreboard Example: Cycle 11**

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    |      |        |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   |        |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 |       |      |      |        |

| Functional unit status. | :    |      | dest | S1 | S2 | FU    | FU | Fj? | Fk? |
|-------------------------|------|------|------|----|----|-------|----|-----|-----|
| Time Name               | Busy | Op   | Fi   | Fj | Fk | Qj    | Qk | Rj  | Rk  |
| Integer                 | No   |      |      |    |    |       |    |     |     |
| 8 Mult1                 | Yes  | Mult | F0   | F2 | F4 |       |    | No  | No  |
| Mult2                   | No   |      |      |    |    |       |    |     |     |
| 0 Add                   | Yes  | Sub  | F8   | F6 | F2 |       |    | No  | No  |
| Divide                  | Yes  | Div  | F10  | F0 | F6 | Mult1 |    | No  | Yes |

#### Register result status:

| Clock |    | F0    | F2 | F4 | F6 | F8  | F10    | F12 | <br>F30 |
|-------|----|-------|----|----|----|-----|--------|-----|---------|
| 11    | FU | Mult1 |    |    |    | Add | Divide |     |         |

2/08/2012 cs252-S12, Lecture07 27 2/08/2012 cs252-S12, Lecture07

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    |      |        |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 |       |      |      |        |

| Functional unit status. | •    |      | dest | S1 | <i>S</i> 2 | FU    | FU | Fj? | Fk? |
|-------------------------|------|------|------|----|------------|-------|----|-----|-----|
| Time Name               | Busy | Op   | Fi   | Fj | Fk         | Qj    | Qk | Rj  | Rk  |
| Integer                 | No   |      |      |    |            |       |    |     |     |
| 7 Mult1                 | Yes  | Mult | F0   | F2 | F4         |       |    | No  | No  |
| Mult2                   | No   |      |      |    |            |       |    |     |     |
| Add                     | No   |      |      |    |            |       |    |     |     |
| Divide                  | Yes  | Div  | F10  | F0 | F6         | Mult1 |    | No  | Yes |

#### Register result status:

| Clock | F0       | F2 | F4 | F6 | F8 | F10    | F12 | <br>F30 |
|-------|----------|----|----|----|----|--------|-----|---------|
| 12    | FU Mult1 |    |    |    |    | Divide |     |         |

#### · Read operands for DIVD?

08/2012 cs252-S12, Lecture07

## **Scoreboard Example: Cycle 13**

| Instruction | ı sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    |      |        |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 | 13    |      |      |        |

| Functional unit status: | Functional unit status: |      |     | S1 | S2 | FU    | FU | Fj? | Fk? |
|-------------------------|-------------------------|------|-----|----|----|-------|----|-----|-----|
| Time Name               | Busy                    | Op   | Fi  | Fj | Fk | Qj    | Qk | Rj  | Rk  |
| Integer                 | No                      |      |     |    |    |       |    |     |     |
| 6 Mult1                 | Yes                     | Mult | F0  | F2 | F4 |       |    | No  | No  |
| Mult2                   | No                      |      |     |    |    |       |    |     |     |
| Add                     | Yes                     | Add  | F6  | F8 | F2 |       |    | Yes | Yes |
| Divide                  | Yes                     | Div  | F10 | FO | F6 | Mult1 |    | No  | Yes |

#### Register result status:

| Clock | F0       | F2 | F4 | F6  | F8 | F10 F12 | <br>F30 |
|-------|----------|----|----|-----|----|---------|---------|
| 13    | FU Mult1 |    |    | Add |    | Divide  |         |

2/08/2012 cs252-S12, Lecture07 30

## **Scoreboard Example: Cycle 14**

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    |      |        |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 | 13    | 14   |      |        |

| Functional unit status. | Functional unit status: |      |     |    | <i>S</i> 2 | FU    | FU | Fj? | Fk? |
|-------------------------|-------------------------|------|-----|----|------------|-------|----|-----|-----|
| Time Name               | Busy                    | Op   | Fi  | Fj | Fk         | Qj    | Qk | Rj  | Rk  |
| Integer                 | No                      |      |     |    |            |       |    |     |     |
| 5 Mult1                 | Yes                     | Mult | F0  | F2 | F4         |       |    | No  | No  |
| Mult2                   | No                      |      |     |    |            |       |    |     |     |
| 2 Add                   | Yes                     | Add  | F6  | F8 | F2         |       |    | Yes | Yes |
| Divide                  | Yes                     | Div  | F10 | F0 | F6         | Mult1 |    | No  | Yes |

#### Register result status:

| Clock |          | F2 | F4 | F6  | F8 | F10 F12 | <br>F30 |
|-------|----------|----|----|-----|----|---------|---------|
| 14    | FU Mult1 |    |    | Add |    | Divide  |         |

## **Scoreboard Example: Cycle 15**

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    |      |        |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 | 13    | 14   |      |        |

| Functional unit status. | Functional unit status: |      |     |    | <i>S</i> 2 | FU    | FU | Fj? | Fk? |
|-------------------------|-------------------------|------|-----|----|------------|-------|----|-----|-----|
| Time Name               | Busy                    | Op   | Fi  | Fj | Fk         | Qj    | Qk | Rj  | Rk  |
| Integer                 | No                      |      |     |    |            |       |    |     |     |
| 4 Mult1                 | Yes                     | Mult | F0  | F2 | F4         |       |    | No  | No  |
| Mult2                   | No                      |      |     |    |            |       |    |     |     |
| 1 Add                   | Yes                     | Add  | F6  | F8 | F2         |       |    | No  | No  |
| Divide                  | Yes                     | Div  | F10 | F0 | F6         | Mult1 |    | No  | Yes |

#### Register result status:

| Clock |    | F0    | F2 | F4 | F6  | F8 | F10    | F12 | <br>F30 |
|-------|----|-------|----|----|-----|----|--------|-----|---------|
| 15    | FU | Mult1 |    |    | Add |    | Divide |     |         |

2/08/2012 cs252-S12, Lecture07 31 2/08/2012 cs252-S12, Lecture07 32

29

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    |      |        |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 | 13    | 14   | 16   |        |

| Functional unit status. |      | dest | S1  | <i>S</i> 2 | FU | FU    | Fj? | Fk? |     |
|-------------------------|------|------|-----|------------|----|-------|-----|-----|-----|
| Time Name               | Busy | Op   | Fi  | Fj         | Fk | Qj    | Qk  | Rj  | Rk  |
| Integer                 | No   |      |     |            |    |       |     |     |     |
| 3 Mult1                 | Yes  | Mult | F0  | F2         | F4 |       |     | No  | No  |
| Mult2                   | No   |      |     |            |    |       |     |     |     |
| 0 Add                   | Yes  | Add  | F6  | F8         | F2 |       |     | No  | No  |
| Divide                  | Yes  | Div  | F10 | F0         | F6 | Mult1 |     | No  | Yes |

#### Register result status:

| Clock |    | F0    | F2 | F4 | F6  | F8 | F10    | F12 | ••• | F30 |
|-------|----|-------|----|----|-----|----|--------|-----|-----|-----|
| 16    | FU | Mult1 |    |    | Add |    | Divide |     |     |     |

2/08/2012 cs252-S12, Lecture07 33

## **Scoreboard Example: Cycle 17**



Register result status:
Clock F0 F2

| Clock | F0      | F2 | F4 | F6  | F8 | F10    | F12 | <br>F30 |
|-------|---------|----|----|-----|----|--------|-----|---------|
| 17    | FU Mult |    |    | Add |    | Divide |     |         |

Why not write result of ADD???

2/08/2012 cs252-S12, Lecture07

## **Scoreboard Example: Cycle 18**

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    |      |        |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 | 13    | 14   | 16   |        |

| Functional unit status: | dest            | S1   | S2  | FU | FU | Fj?   | Fk? |    |     |
|-------------------------|-----------------|------|-----|----|----|-------|-----|----|-----|
| Time Name               | ne Name Busy Op |      | Fi  | Fj | Fk | Qj    | Qk  | Rj | Rk  |
| Integer                 | No              |      |     |    |    |       |     |    |     |
| 1 Mult1                 | Yes             | Mult | F0  | F2 | F4 |       |     | No | No  |
| Mult2                   | No              |      |     |    |    |       |     |    |     |
| Add                     | Yes             | Add  | F6  | F8 | F2 |       |     | No | No  |
| Divide                  | Yes             | Div  | F10 | F0 | F6 | Mult1 |     | No | Yes |

#### Register result status:

| Clock |          | F2 | F4 | F6  | F8 | F10 F12 | <br>F30 |
|-------|----------|----|----|-----|----|---------|---------|
| 18    | FU Mult1 |    |    | Add |    | Divide  |         |

## **Scoreboard Example: Cycle 19**

|             |     |     |    |       |      |      | 1      |
|-------------|-----|-----|----|-------|------|------|--------|
| Instruction |     |     |    |       |      | Exec |        |
| Instructio  | n   | j   | k  | Issue | Oper | Comp | Result |
| LD          | F6  | 34+ | R2 | 1     | 2    | 3    | 4      |
| LD          | F2  | 45+ | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0  | F2  | F4 | 6     | 9    | 19   |        |
| SUBD        | F8  | F6  | F2 | 7     | 9    | 11   | 12     |
| DIVD        | F10 | F0  | F6 | 8     |      |      |        |
| ADDD        | F6  | F8  | F2 | 13    | 14   | 16   |        |

| Functional unit status: |      | dest | S1  | <i>S</i> 2 | FU | FU    | Fj? | Fk? |     |
|-------------------------|------|------|-----|------------|----|-------|-----|-----|-----|
| Time Name               | Busy | Op   | Fi  | Fj         | Fk | Qj    | Qk  | Rj  | Rk  |
|                         | No   |      |     |            |    |       |     |     |     |
| 0 Mult1                 | Yes  | Mult | F0  | F2         | F4 |       |     | No  | No  |
| Mult2                   | No   |      |     |            |    |       |     |     |     |
| Add<br>Divide           | Yes  | Add  | F6  | F8         | F2 |       |     | No  | No  |
| Divide                  | Yes  | Div  | F10 | FO         | F6 | Mult1 |     | No  | Yes |

#### Register result status:

| Clock | <i>.</i> | F0    | F2 | F4 | F6  | F8 | F10    | F12 | <br>F30 |
|-------|----------|-------|----|----|-----|----|--------|-----|---------|
| 19    | FU       | Mult1 |    |    | Add |    | Divide |     |         |

2/08/2012 cs252-S12, Lecture07 35 2/08/2012 cs252-S12, Lecture07 3

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    | 19   | 20     |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD        | F10   | F0   | F6 | 8     |      |      |        |
| ADDD        | F6    | F8   | F2 | 13    | 14   | 16   |        |

| Functional unit status. | dest           | S1  | <i>S</i> 2 | FU | FU | Fj? | Fk? |     |     |
|-------------------------|----------------|-----|------------|----|----|-----|-----|-----|-----|
| Time Name               | Time Name Busy |     | Fi         | Fj | Fk | Qj  | Qk  | Rj  | Rk  |
| Integer                 | No             |     |            |    |    |     |     |     |     |
| Mult1                   | No             |     |            |    |    |     |     |     |     |
| Mult2                   | No             |     |            |    |    |     |     |     |     |
| Add                     | Yes            | Add | F6         | F8 | F2 |     |     | No  | No  |
| Divide                  | Yes            | Div | F10        | F0 | F6 |     |     | Yes | Yes |

#### Register result status:

| Clock | I  | F0 | F2 | F4 | F6  | F8 | F10    | F12 | ••• | F30 |
|-------|----|----|----|----|-----|----|--------|-----|-----|-----|
| 20    | FU |    |    |    | Add |    | Divide |     |     |     |

2/08/2012 cs252-S12, Lecture07 37

## **Scoreboard Example: Cycle 21**

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    | 19   | 20     |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD        | F10   | F0   | F6 | 8     | 21   |      |        |
| ADDD        | F6    | F8   | F2 | 13    | 14   | 16   |        |
|             |       |      |    |       |      |      |        |

| Functional unit status: |      |     | dest | SI | S2 | FU | FU | Fj? | Fk? |
|-------------------------|------|-----|------|----|----|----|----|-----|-----|
| Time Name               | Busy | Op  | Fi   | Fj | Fk | Qj | Qk | Rj  | Rk  |
| Integer                 | No   |     |      |    |    |    |    |     |     |
| Mult1                   | No   |     |      |    |    |    |    |     |     |
| Mult2                   | No   |     |      |    |    |    |    |     |     |
| Add                     | Yes  | Add | F6   | F8 | F2 |    |    | No  | No  |
| Divide                  | Yes  | Div | F10  | F0 | F6 |    |    | Yes | Yes |

#### Register result status:

| Clock |    | F0 | F2 | F4 | F6  | F8 | F10    | F12 | <br>F30 |
|-------|----|----|----|----|-----|----|--------|-----|---------|
| 21    | FU |    |    |    | Add |    | Divide |     |         |

WAR Hazard is now gone...

2/08/2012 cs252-S12, Lecture07

## **Scoreboard Example: Cycle 22**

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 5     | 6    | 7    | 8      |
| MULTD       | F0    | F2   | F4 | 6     | 9    | 19   | 20     |
| SUBD        | F8    | F6   | F2 | 7     | 9    | 11   | 12     |
| DIVD        | F10   | F0   | F6 | 8     | 21   |      |        |
| ADDD        | F6    | F8   | F2 | 13    | 14   | 16   | 22     |
|             |       |      |    |       |      |      |        |

# Functional unit status: dest S1 S2 FU FU Fj? Fk? Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer No No

#### Register result status:

Clock F0 F2 F4 F6 F8 F10 F12 ... F30
22 FU Divide

Faster than light computation (skip a couple of cycles)

2/08/2012 cs252-S12, Lecture07 39 2/08/2012 cs252-S12, Lecture07 4

#### Instruction status: Read Exec Write k Issue Oper Comp Result Instruction F6 34+ R2 LD 8 F2 45+ R3 MULTD F0 F2 F4 9 19 20 12 F8 F6 F2 9 11 F10 F0 F6 21 13 14 16 22 ADDD F6 F8 F2

| Functional unit status. | :    |     | dest | S1 | <i>S</i> 2 | FU | FU | Fj? | Fk? |
|-------------------------|------|-----|------|----|------------|----|----|-----|-----|
| Time Name               | Busy | Op  | Fi   | Fj | Fk         | Qj | Qk | Rj  | Rk  |
| Integer                 | No   |     |      |    |            |    |    |     |     |
| Mult1                   | No   |     |      |    |            |    |    |     |     |
| Mult2                   | No   |     |      |    |            |    |    |     |     |
| Add                     | No   |     |      |    |            |    |    |     |     |
| 0 Divide                | Yes  | Div | F10  | F0 | F6         |    |    | No  | No  |

#### Register result status:

Clock

| Clock |    | F0 | F2 | F4 | F6 | F8 | F10    | F12 | <br>F30 |
|-------|----|----|----|----|----|----|--------|-----|---------|
| 61    | FU |    |    |    |    |    | Divide |     |         |

2/08/2012 cs252-S12, Lecture07 41

#### **Scoreboard Example: Cycle 62**

| Instruction | n sta | tus:   |       |       | Read | Exec | Write  |            |    |    |     |     |
|-------------|-------|--------|-------|-------|------|------|--------|------------|----|----|-----|-----|
| Instructio  | n     | j      | k     | Issue | Oper | Comp | Result |            |    |    |     |     |
| LD          | F6    | 34+    | R2    | 1     | 2    | 3    | 4      |            |    |    |     |     |
| LD          | F2    | 45+    | R3    | 5     | 6    | 7    | 8      |            |    |    |     |     |
| MULTD       | F0    | F2     | F4    | 6     | 9    | 19   | 20     |            |    |    |     |     |
| SUBD        | F8    | F6     | F2    | 7     | 9    | 11   | 12     |            |    |    |     |     |
| DIVD        | F10   | F0     | F6    | 8     | 21   | 61   | 62     |            |    |    |     |     |
| ADDD        | F6    | F8     | F2    | 13    | 14   | 16   | 22     |            |    |    |     |     |
| Functiona   | l uni | it sto | itus. |       |      | dest | S1     | <i>S</i> 2 | FU | FU | Fj? | Fk? |
|             | Time  | Nan    | ıe    | Busy  | Op   | Fi   | Fj     | Fk         | Qj | Qk | Rj  | Rk  |
|             |       | Integ  | ger   | No    |      |      |        |            |    |    |     |     |
|             |       | Mul    | :1    | No    |      |      |        |            |    |    |     |     |
|             |       | Mul    | 12    | No    |      |      |        |            |    |    |     |     |
|             |       | Add    |       | No    |      |      |        |            |    |    |     |     |

#### Register result status:

Divide

No

| Clock |    | F0 | F2 | F4 | F6 | F8 | F10 | F12 | <br>F30 |
|-------|----|----|----|----|----|----|-----|-----|---------|
| 62    | FU |    |    |    |    |    |     |     |         |

2/08/2012 cs252-S12, Lecture07 42

## **Review: Scoreboard Example: Cycle 62**

#### Instruction status: Read Exec Write Issue Oper Comp Result Instruction kF6 34+ R2 F2 45+ R3 MULTD F0 F2 F4 20 19 11 12 F10 F0 F6 ADDD Functional unit status: dest S1S2FU FU Fj? Fk? FiTime Name Busy OpQj Integer Mult1 No Mult2 No Add No Divide Register result status:

#### · In-order issue; out-of-order execute & commit

F2 F4 F6 F8 F10 F12 ... F30

#### **CDC 6600 Scoreboard**

- Speedup 1.7 from compiler; 2.5 by hand BUT slow memory (no cache) limits benefit
- Limitations of 6600 scoreboard:
  - No forwarding hardware
  - Limited to instructions in basic block (small window)
  - Small number of functional units (structural hazards), especially integer/load store units
  - Do not issue on structural hazards
  - Wait for WAR hazards
  - Prevent WAW hazards

2/08/2012 cs252-S12, Lecture07 43 2/08/2012 cs252-S12, Lecture07

#### **CS 252 Administrivia**

- Interesting Resource: http://bitsavers.org
  - Has digital versions of users manuals for old machines
  - Quite interesting!
  - I'll link in some of them to your reading pages when it is appropriate
  - Very limited bandwidth: use mirrors such as: http://bitsavers.vt100.net
- Midterm I: March 21<sup>st</sup>
  - Will try to do a 5:00-8:00 slot. Would this work for people?
  - No class that day
  - Pizza afterwards...

2/08/2012 cs252-S12, Lecture07 45 2/08/2012 cs252-S12, Lecture07 4

## **Tomasulo Organization**



## Another Dynamic Algorithm: Tomasulo Algorithm

- For IBM 360/91 about 3 years after CDC 6600 (1966)
- Goal: High Performance without special compilers
- Differences between IBM 360 & CDC 6600 ISA
  - IBM has only 2 register specifiers/instr vs. 3 in CDC 6600
  - IBM has 4 FP registers vs. 8 in CDC 6600
  - IBM has memory-register ops
- Why Study? lead to Alpha 21264, HP 8000, MIPS 10000, Pentium II, PowerPC 604, ...

#### Tomasulo Algorithm vs. Scoreboard

- Control & buffers <u>distributed</u> with Function Units (FU) vs. centralized in scoreboard:
  - FU buffers called "reservation stations"; have pending operands
- Registers in instructions replaced by values or pointers to reservation stations(RS); called <u>register renaming</u>;
  - avoids WAR, WAW hazards
  - More reservation stations than registers, so can do optimizations compilers can't
- Results to FU from RS, not through registers, over Common Data Bus that broadcasts results to all FUs
- · Load and Stores treated as FUs with RSs as well
- Integer instructions can go past branches, allowing FP ops beyond basic block in FP queue

2/08/2012 cs252-S12, Lecture07 47 2/08/2012 cs252-S12, Lecture07 48

#### **Reservation Station Components**

Op: Operation to perform in the unit (e.g., + or -)

Vj, Vk: Value of Source operands

- Store buffers has V field, result to be stored

Qj, Qk: Reservation stations producing source registers (value to be written)

- Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready
- Store buffers only have Qi for RS producing result

**Busy:** Indicates reservation station or FU is busy

Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.

2/08/2012 cs252-S12, Lecture07 49 2/08/2012 cs252-S12, Lecture07

#### **Tomasulo Example**



#### **Three Stages of Tomasulo Algorithm**

#### 1.Issue—get instruction from FP Op Queue

If reservation station free (no structural hazard), control issues instr & sends operands (renames registers).

#### 2. Execution—operate on operands (EX)

When both operands ready then execute; if not ready, watch Common Data Bus for result

#### 3. Write result—finish execution (WB)

Write on Common Data Bus to all awaiting units; mark reservation station available

- Normal data bus: data + destination ("go to" bus)
- Common data bus: data + source ("come from" bus)
  - 64 bits of data + 4 bits of Functional Unit source address
  - Write if matches expected Functional Unit (produces result)
  - Does the broadcast

#### **Tomasulo Example Cycle 1**





Note: Unlike 6600, can have multiple loads outstanding

2/08/2012 cs252-S12, Lecture07 53

#### **Tomasulo Example Cycle 3**



· Note: registers names are removed ("renamed") in Reservation Stations; MULT issued vs. scoreboard 2008/2016-0041 completing; what is waiting for Load1?

54

**Tomasulo Example Cycle 4** 

Load2 completing; what is waiting for Load2?



## **Tomasulo Example Cycle 5**



2/08/2012 cs252-S12, Lecture07 55 2/08/2012 cs252-S12, Lecture07 56

#### Instruction status: Exec Write Issue Comp Result Busy Address Instruction LD F6 34+ R2 4 No Load1 F2 45+ R3 2 Load2 No MULTD F0 F2 F4 Load3 No 3 F2 DIVD F10 FO ADDD F6 F8 F2

#### Reservation Stations: S1 S2 RS

| Time Name |     |       |       |       | Qj    | Qk |
|-----------|-----|-------|-------|-------|-------|----|
| 1 Add1    | Yes | SUBD  | M(A1) | M(A2) |       |    |
| Add2      | Yes | ADDD  |       | M(A2) | Add1  |    |
| Add3      |     |       |       |       |       |    |
| 9 Mult1   | Yes | MULTE | M(A2) | R(F4) |       |    |
| Mult2     | Yes | DIVD  |       | M(A1) | Mult1 |    |

#### Register result status:

| Clock |    | F0    | F2    | F4 | F6   | F8   | F10   | F12 | <br>F30 |
|-------|----|-------|-------|----|------|------|-------|-----|---------|
| 6     | FU | Mult1 | M(A2) |    | Add2 | Add1 | Mult2 |     |         |

#### Issue ADDD here vs. scoreboard?

2/08/2012 cs252-S12, Lecture07 57

## **Tomasulo Example Cycle 8**



Add2 (M-M) Mult2

FU Mult1 M(A2)

#### **Tomasulo Example Cycle 7**

k Issue Comp Result

Instruction status:

Register result status:

Clock

Instruction

| LD          | F6    | 34+   | R2   | 1     | 3     | 4          |       | Load1 | No |  |
|-------------|-------|-------|------|-------|-------|------------|-------|-------|----|--|
| LD          | F2    | 45+   | R3   | 2     | 4     | 5          |       | Load2 | No |  |
| MULTD       | FO    | F2    | F4   | 3     |       |            |       | Load3 | No |  |
| SUBD        | F8    | F6    | F2   | 4     | 7     |            |       |       |    |  |
| DIVD        | F10   | FO    | F6   | 5     |       |            |       |       |    |  |
| ADDD        | F6    | F8    | F2   | 6     |       |            |       |       |    |  |
| Reservation | on St | ation | s:   |       | S1    | <i>S</i> 2 | RS    | RS    |    |  |
|             | Time  | Name  | Busy | Op    | Vj    | Vk         | Qj    | Qk    | _  |  |
|             | 0     | Add1  | Yes  | SUBD  | M(A1) | M(A2)      |       |       |    |  |
|             |       | Add2  | Yes  | ADDD  |       | M(A2)      | Add1  |       |    |  |
|             |       | Add3  | No   |       |       |            |       |       |    |  |
|             | 8     | Mult1 | Yes  | MULTD | M(A2) | R(F4)      |       |       |    |  |
|             |       | Mult2 | Yes  | DIVD  |       | M(A1)      | Mult1 |       |    |  |
|             |       |       |      |       |       |            |       |       |    |  |

Exec Write

Busy Address

F0 F2 F4 F6 F8 F10 F12 ... F30

Add2 Add1 Mult2

## · Add1 completing; what is waiting for it?

FU Mult1 M(A2)

2/08/2012 cs252-S12, Lecture07 58

#### **Tomasulo Example Cycle 9**



#### Instruction status: Exec Write Issue Comp Result Busy Address Instruction LD F6 34+ R2 4 No Load1 F2 45+ R3 5 Load2 No F2 Load3 No MULTD F0 F4 3 F2 F0 F6 DIVD F10 ADDD F6 F8 10 Reservation Stations: SIS2RSRSTime Name Busy Op $V_i$ VkOkAdd1 Yes ADDD (M-M) M(A2) 0 Add2 Add3 No 5 Mult1 Yes MULTD M(A2) R(F4) Mult2 Yes DIVD M(A1) Mult1 Register result status: F6 F8 F10 F12 ... F30 Clock F0 F2

· Add2 completing; what is waiting for it?

FU Mult1 M(A2)

10

2/08/2012 cs252-S12, Lecture07 61

Add2 (M-M) Mult2

#### **Tomasulo Example Cycle 11**



(M-M+M(M-M) Mult2

· Write result of ADDD here vs. scoreboard?

11

• All quick instructions complete in this cycle! cs252-S12, Lecture07

62

#### **Tomasulo Example Cycle 12**

|             |       |        | <b></b> |       | ••••• | <u> </u>   | ٠ , - |       |       |         |       |
|-------------|-------|--------|---------|-------|-------|------------|-------|-------|-------|---------|-------|
| Instruction | ı sta | tus:   |         |       | Exec  | Write      |       |       |       |         |       |
| Instructio  | n     | j      | k       | Issue | Comp  | Result     |       |       | Busy  | Address |       |
| LD          | F6    | 34+    | R2      | 1     | 3     | 4          |       | Load1 | No    |         |       |
| LD          | F2    | 45+    | R3      | 2     | 4     | 5          |       | Load2 | No    |         |       |
| MULTD       | F0    | F2     | F4      | 3     |       |            |       | Load3 | No    |         |       |
| SUBD        | F8    | F6     | F2      | 4     | 7     | 8          |       |       |       |         |       |
| DIVD        | F10   | FO     | F6      | 5     |       |            |       |       |       |         |       |
| ADDD        | F6    | F8     | F2      | 6     | 10    | 11         |       |       |       |         |       |
| Reservatio  | n St  | ations | s:      |       | SI    | <i>S</i> 2 | RS    | RS    |       |         |       |
|             | Time  | Name   | Busy    | Op    | Vj    | Vk         | Qj    | Qk    |       |         |       |
|             |       | Add1   | No      |       |       |            |       |       |       |         |       |
|             |       | Add2   | No      |       |       |            |       |       |       |         |       |
|             |       | Add3   | No      |       |       |            |       |       |       |         |       |
|             | 3     | Mult1  | Yes     | MULTE | M(A2) | R(F4)      |       |       |       |         |       |
|             |       | Mult2  | Yes     | DIVD  |       | M(A1)      | Mult1 |       |       |         |       |
| Register re | esult | statu  | s:      |       |       |            |       |       |       |         |       |
| Clock       |       |        |         | F0    | F2    | F4         | F6    | F8    | F10   | F12     | <br>1 |
| 12          |       |        | FU      | Mult1 | M(A2) | (1         | M-M+N | (M-M) | Mult2 |         |       |

## **Tomasulo Example Cycle 13**



| Instruction | n sta | tus:   |      |       | Exec  | Write      |       |       |      |         |
|-------------|-------|--------|------|-------|-------|------------|-------|-------|------|---------|
| Instructio  | n     | j      | k    | Issue | Comp  | Result     |       |       | Busy | Address |
| LD          | F6    | 34+    | R2   | 1     | 3     | 4          |       | Load1 | No   |         |
| LD          | F2    | 45+    | R3   | 2     | 4     | 5          |       | Load2 | No   |         |
| MULTD       | F0    | F2     | F4   | 3     |       |            |       | Load3 | No   |         |
| SUBD        | F8    | F6     | F2   | 4     | 7     | 8          |       |       |      |         |
| DIVD        | F10   | FO     | F6   | 5     |       |            |       |       |      |         |
| ADDD        | F6    | F8     | F2   | 6     | 10    | 11         |       |       |      |         |
| Reservatio  | on St | ation  | s:   |       | SI    | <i>S</i> 2 | RS    | RS    |      |         |
|             | Time  | Name   | Busy | Op    | Vj    | Vk         | Qj    | Qk    |      |         |
|             |       | Add1   | No   |       |       |            |       |       |      |         |
|             |       | Add2   | No   |       |       |            |       |       |      |         |
|             |       | Add3   | No   |       |       |            |       |       |      |         |
|             | 1     | Mult 1 | Yes  | MULTI | M(A2) | R(F4)      |       |       |      |         |
|             |       | Mult2  | Yes  | DIVD  |       | M(A1)      | Mult1 |       |      |         |
| Register re | esult | statu  | s:   |       |       |            |       |       |      |         |

| Clock |    | F0    | F2    | F4 | F6    | F8    | F10   | F12 | <br>F30 |
|-------|----|-------|-------|----|-------|-------|-------|-----|---------|
| 14    | FU | Mult1 | M(A2) | (  | M-M+N | (M-M) | Mult2 |     |         |

2/08/2012 cs252-S12, Lecture07 65 2/08/2012 cs252-S12, Lecture07

## **Tomasulo Example Cycle 16**



**Faster than light computation** (skip a couple of cycles)

**Tomasulo Example Cycle 15** 

Issue Comp Result

15

7

10

SI

 $V_j$ 

Exec Write

4

5

S2

M(A1) Mult1

RS

 $Q_j$ 

Busy Address

No

No

No

Load1

Load2

Load3

Qk

F2 F4 F6 <u>F8 F10 F12</u> ... F30

*Instruction status:* 

MULTD F0

SUBD F8

ADDD F6

Reservation Stations:

Register result status:

DIVD

Clock

15

F6 34+

F2 45+

F10 F0

F2

Add1 Add2

Add3

F6

R2

R3

F4

F2

F6

No

No

Time Name Busy Op

2

3

4

0 Mult1 Yes MULTD M(A2) R(F4) Mult2 Yes DIVD

Instruction

LD

2/08/2012 cs252-S12. Lecture07 67 2/08/2012 cs252-S12. Lecture07

| Instructio  | n sta | tus:  |      |       | Exec | Write      |    |       |      |         |
|-------------|-------|-------|------|-------|------|------------|----|-------|------|---------|
| Instruction | on    | j     | k    | Issue | Comp | Result     |    |       | Busy | Address |
| LD          | F6    | 34+   | R2   | 1     | 3    | 4          |    | Load1 | No   |         |
| LD          | F2    | 45+   | R3   | 2     | 4    | 5          |    | Load2 | No   |         |
| MULTD       | FO    | F2    | F4   | 3     | 15   | 16         |    | Load3 | No   |         |
| SUBD        | F8    | F6    | F2   | 4     | 7    | 8          |    |       |      |         |
| DIVD        | F10   | FO    | F6   | 5     |      |            |    |       |      |         |
| ADDD        | F6    | F8    | F2   | 6     | 10   | 11         |    |       |      |         |
| Reservation | on St | ation | s:   |       | SI   | <i>S</i> 2 | RS | RS    |      |         |
|             | Time  | Name  | Busy | Op    | Vj   | Vk         | Qj | Qk    |      |         |
|             |       | Add1  | No   |       |      |            |    |       |      |         |
|             |       | Add2  | No   |       |      |            |    |       |      |         |
|             |       | Add3  | No   |       |      |            |    |       |      |         |
|             |       | Mult1 | No   |       |      |            |    |       |      |         |
|             | 1     | Mult2 | Yes  | DIVD  | M*F4 | M(A1)      |    |       | 1    |         |
| Register r  | esult | statu | s:   |       |      |            |    |       |      |         |

Clock

2/08/2012 cs252-S12, Lecture07 69

F0 F2 F4 F<u>6</u> F8 F10 F12 ... F30

#### **Tomasulo Example Cycle 56**



· Mult2 is completing; what is waiting for it?

2/08/2012 cs252-S12, Lecture07 70

## **Tomasulo Example Cycle 57**



• Once again: In-order issue, out-of-order execution and completion.

## **Compare to Scoreboard Cycle 62**



- · Why take longer on scoreboard/6600?
  - · Structural Hazards
  - · Lack of forwarding

71

2/08/2012 cs252-S12. Lecture07 72

# Tomasulo v. Scoreboard (IBM 360/91 v. CDC 6600)

Pipelined Functional Units
(6 load, 3 store, 3 +, 2 x/÷) (
window size: ≤ 14 instructions
No issue on structural hazard
WAR: renaming avoids
WAW: renaming avoids
Broadcast results from FU
Control: reservation stations

Multiple Functional Units
(1 load/store, 1 + , 2 x, 1 ÷)
≤ 5 instructions
same
stall completion
stall issue
Write/read registers
central scoreboard

#### **Recall: Unrolled Loop That Minimizes Stalls**

| 1 Loc<br>2<br>3<br>4 | LD<br>LD<br>PD: LD | F0,0(R1)<br>F6,-8(R1)<br>F10,-16(R1)<br>F14,-24(R1) | <ul> <li>What assumptions made when moved code?</li> </ul> |
|----------------------|--------------------|-----------------------------------------------------|------------------------------------------------------------|
| 5                    | ADDD               | F4,F0,F2                                            | <ul> <li>OK to move store past</li> </ul>                  |
| 6                    | ADDD               | F8,F6,F2                                            | SUBI even though changes                                   |
| 7                    | ADDD               | F12,F10,F2                                          | register                                                   |
| 8                    | ADDD               | F16,F14,F2                                          | <ul> <li>OK to move loads before</li> </ul>                |
| 9                    | SD                 | 0(R1),F4                                            | stores: get right data?                                    |
| 10                   | SD                 | -8(R1),F8                                           | <ul> <li>When is it safe for</li> </ul>                    |
| 11                   | SD                 | -16(R1),F12                                         | compiler to do such                                        |
| 12                   | SUBI               | R1,R1,#32                                           | changes?                                                   |
| 13                   | BNEZ               | R1,LOOP                                             | 3                                                          |
| 14                   | SD                 | 8(R1),F16                                           | ; 8-32 = -24                                               |

#### 14 clock cycles, or 3.5 per iteration

2/08/2012 cs252-S12, Lecture07 73 2/08/2012 cs252-S12, Lecture07

## **Tomasulo Loop Example**

| Loop: | LD    | F0 | 0    | R1 |
|-------|-------|----|------|----|
|       | MULTD | F4 | F0   | F2 |
|       | SD    | F4 | 0    | R1 |
|       | SUBI  | R1 | R1   | #8 |
|       | BNEZ  | R1 | Loop |    |

- Assume Multiply takes 4 clocks
- Assume first load takes 8 clocks (cache miss), second load takes 1 clock (hit)
- To be clear, will show clocks for SUBI, BNEZ
- Reality: integer instructions ahead

## Loop Example

| Instructi | on statu  | s:    |          |    |       | Ехес | Write  |         |      |      |     |
|-----------|-----------|-------|----------|----|-------|------|--------|---------|------|------|-----|
| ITER      | Instruct  | ion   | j        | k  | Issue | Comp | Result |         | Busy | Addr | Fu  |
| 1         | LD        | F0    | 0        | R1 |       |      |        | Load1   | No   |      |     |
| 1         | MULTD     | F4    | F0       | F2 |       |      |        | Load2   | No   |      |     |
| 1         | SD        | F4    | 0        | R1 |       |      |        | Load3   | No   |      |     |
| 2         | LD        | F0    | 0        | R1 |       |      |        | Store 1 | No   |      |     |
| 2         | MULTD     | F4    | F0       | F2 |       |      |        | Store2  | No   |      |     |
| 2         | SD        | F4    | 0        | R1 |       |      |        | Store3  | No   |      |     |
| Reserva   | tion Stat | ions: |          |    | S1    | S2   | RS     |         |      |      |     |
| Time      | Name      | Busy  | Op       | Vj | Vk    | Qj   | Qk     | Code:   |      |      |     |
|           | Add1      | No    |          |    |       |      |        | LD      | F0   | 0    | R1  |
|           | Add2      | No    |          |    |       |      |        | MULTD   | F4   | F0   | F2  |
|           | Add3      | No    |          |    |       |      |        | SD      | F4   | 0    | R1  |
|           | Mult1     | No    |          |    |       |      |        | SUBI    | R1   | R1   | #8  |
|           | Mult2     | No    |          |    |       |      |        | BNEZ    | R1   | Loop |     |
| Register  | result s  | tatus |          |    |       |      |        |         |      |      |     |
| Clock     | R1        |       | F0       | F2 | F4    | F6   | F8     | F10     | F12  | •••  | F30 |
| 0         | 80        | Fu    |          |    |       |      |        |         |      |      |     |
| U         | 00        | . u   | <u> </u> |    |       |      |        |         |      |      |     |



2/08/2012 2/08/2012 cs252-S12, Lecture07 77

#### **Loop Example Cycle 3**



#### • Implicit renaming sets up "DataFlow" graph 2/08/2012

Loop Example Cycle 2

| Instructi |           |       |       |    |       |            |        |             |      |      |     |
|-----------|-----------|-------|-------|----|-------|------------|--------|-------------|------|------|-----|
|           | ion statu | s:    |       |    |       | Ехес       | Write  |             |      |      |     |
| ITER      | Instructi | ion   | j     | k  | Issue | Comp       | Result |             | Busy | Addr | Fu  |
| 1         | LD        | F0    | 0     | R1 | 1     |            |        | Load1       | Yes  | 80   |     |
| 1         | MULTD     | F4    | F0    | F2 | 2     |            |        | Load2       | No   |      |     |
| 1         | SD        | F4    | 0     | R1 |       |            |        | Load3       | No   |      |     |
| 2         | LD        | F0    | 0     | R1 |       |            |        | Store 1     | No   |      |     |
| 2         | MULTD     | F4    | F0    | F2 |       |            |        | Store2      | No   |      |     |
| 2         | SD        | F4    | 0     | R1 |       |            |        | Store3      | No   |      |     |
| Reserva   | tion Stat | ions: |       |    | S1    | <i>S</i> 2 | RS     |             |      |      |     |
| Time      | Name      | Busy  | Op    | Vj | Vk    | Qj         | Qk     | Code:       |      |      |     |
|           | Add1      | No    |       |    |       |            |        | LD          | F0   | 0    | R1  |
|           | Add2      | No    |       |    |       |            |        | MULTD       | F4   | F0   | F2  |
|           | Add3      | No    |       |    |       |            |        | SD          | F4   | 0    | R1  |
|           | Mult1     | Yes   | Multd |    | R(F4) | Load1      |        | SUBI        | R1   | R1   | #8  |
|           | Mult2     | No    |       |    |       |            |        | <b>BNEZ</b> | R1   | Loop |     |
| Register  | result st | tatus |       |    |       |            |        |             |      |      |     |
| Clock     | R1        |       | F0    | F2 | F4    | F6         | F8     | F10         | F12  |      | F30 |
| 2         | 80        | Fu    | Load1 |    | Mult1 | <u> </u>   |        | <u> </u>    |      |      |     |

cs252-S12, Lecture07

#### **Loop Example Cycle 4**



Dispatching SUBI Instruction

**79** 

| Instruction statu  | s:    |       |    |       | Exec  | Write  |             |      |      |       |
|--------------------|-------|-------|----|-------|-------|--------|-------------|------|------|-------|
| ITER Instruct      | ion   | j     | k  | Issue | Comp  | Result |             | Busy | Addr | Fu    |
| 1 LD               | F0    | 0     | R1 | 1     |       |        | Load1       | Yes  | 80   |       |
| 1 MULTD            | F4    | F0    | F2 | 2     |       |        | Load2       | No   |      |       |
| 1 SD               | F4    | 0     | R1 | 3     |       |        | Load3       | No   |      |       |
| 2 LD               | F0    | 0     | R1 |       |       |        | Store1      | Yes  | 80   | Mult1 |
| 2 MULTD            | F4    | F0    | F2 |       |       |        | Store2      | No   |      |       |
| 2 SD               | F4    | 0     | R1 |       |       |        | Store3      | No   |      |       |
| Reservation Stat   | ions: |       |    | S1    | S2    | RS     |             |      |      |       |
| Time Name          | Busy  | Op    | Vj | Vk    | Qj    | Qk     | Code:       |      |      |       |
| Add1               | No    |       |    |       |       |        | LD          | F0   | 0    | R1    |
| Add2               | No    |       |    |       |       |        | MULTD       | F4   | F0   | F2    |
| Add3               | No    |       |    |       |       |        | SD          | F4   | 0    | R1    |
| Mult1              | Yes   | Multd |    | R(F4) | Load1 |        | SUBI        | R1   | R1   | #8    |
| Mult2              | No    |       |    |       |       |        | <b>BNEZ</b> | R1   | Loop |       |
| Register result st | tatus |       |    |       |       |        |             |      |      |       |
| Clock R1           |       | F0    | F2 | F4    | F6    | F8     | F10         | F12  | •••  | F30   |
| 5 72               | Fu    | Load1 |    | Mult1 |       |        |             |      |      |       |

• And, BNEZ instruction

2/08/2012 cs252-S12, Lecture07

**Loop Example Cycle 6** 

|           | JOP       |       | ~r    | ,,,        |       | 0.0   |        |             |      |      |       |
|-----------|-----------|-------|-------|------------|-------|-------|--------|-------------|------|------|-------|
| Instructi | on statu  | s:    |       |            |       | Ехес  | Write  |             |      |      |       |
| ITER      | Instruct  | ion   | j     | k          | Issue | Comp  | Result |             | Busy | Addr | Fu    |
| 1         | LD        | F0    | 0     | R1         | 1     |       |        | Load1       | Yes  | 80   |       |
| 1         | MULTD     | F4    | F0    | F2         | 2     |       |        | Load2       | Yes  | 72   |       |
| 1         | SD        | F4    | 0     | R1         | 3     |       |        | Load3       | No   |      |       |
| 2         | LD        | F0    | 0     | R1         | 6     |       |        | Store 1     | Yes  | 80   | Mult1 |
| 2         | MULTD     | F4    | F0    | F2         |       |       |        | Store2      | No   |      |       |
| 2         | SD        | F4    | 0     | R1         |       |       |        | Store3      | No   |      |       |
| Reserva   | tion Stat | ions: |       |            | S1    | S2    | RS     |             |      |      |       |
| Time      | Name      | Busy  | Op    | Vj         | Vk    | Qj    | Qk     | Code:       |      |      |       |
|           | Add1      | No    |       |            |       |       |        | LD          | F0   | 0    | R1    |
|           | Add2      | No    |       |            |       |       |        | MULTD       | F4   | F0   | F2    |
|           | Add3      | No    |       |            |       |       |        | SD          | F4   | 0    | R1    |
|           | Mult1     | Yes   | Multd |            | R(F4) | Load1 |        | SUBI        | R1   | R1   | #8    |
|           | Mult2     | No    |       |            |       |       |        | <b>BNEZ</b> | R1   | Loop |       |
| Register  | result si | tatus |       |            |       |       |        |             |      |      |       |
| Clock     | R1        |       | F0    | <i>F</i> 2 | F4    | F6    | F8     | F10         | F12  |      | F30   |
| 6         | 72        | Fu    | Load2 |            | Mult1 |       |        |             |      |      |       |
|           |           |       |       |            |       |       |        |             |      |      |       |

Notice that F0 never sees Load from location 80

2/08/2012 cs252-S12, Lecture07

**Loop Example Cycle 7** 

|           |           |       |       |    |       | <u> </u> | <u> </u> |             |      |      |       |
|-----------|-----------|-------|-------|----|-------|----------|----------|-------------|------|------|-------|
| Instructi | on statu  | s:    |       |    |       | Exec     | Write    |             |      |      |       |
| ITER      | Instructi | ion   | j     | k  | Issue | Compl    | Result   |             | Busy | Addr | Fu    |
| 1         | LD        | F0    | 0     | R1 | 1     |          |          | Load1       | Yes  | 80   |       |
| 1         | MULTD     | F4    | F0    | F2 | 2     |          |          | Load2       | Yes  | 72   |       |
| 1         | SD        | F4    | 0     | R1 | 3     |          |          | Load3       | No   |      |       |
| 2         | LD        | F0    | 0     | R1 | 6     |          |          | Store1      | Yes  | 80   | Mult1 |
| 2         | MULTD     | F4    | F0    | F2 | 7     |          |          | Store2      | No   |      |       |
| 2         | SD        | F4    | 0     | R1 |       |          |          | Store3      | No   |      |       |
| Reservat  | tion Stat | ions: |       |    | S1    | S2       | RS       |             |      |      |       |
| Time      | Name      | Busy  | Op    | Vj | Vk    | Qj       | Qk       | Code:       |      |      |       |
|           | Add1      | No    |       |    |       |          |          | LD          | F0   | 0    | R1    |
|           | Add2      | No    |       |    |       |          |          | MULTD       | F4   | F0   | F2    |
|           | Add3      | No    |       |    |       |          |          | SD          | F4   | 0    | R1    |
|           | Mult1     | Yes   | Multd |    | R(F2) | Load1    |          | SUBI        | R1   | R1   | #8    |
|           | Mult2     | Yes   | Multd |    | R(F2) | Load2    |          | <b>BNEZ</b> | R1   | Loop |       |
| Register  | result st | atus  |       |    |       |          |          |             |      |      |       |
| Clock     | R1        |       | F0    | F2 | F4    | F6       | F8       | F10         | F12  | •••  | F30   |
| 7         | 72        | Fu    | Load2 |    | Mult2 |          |          |             |      |      |       |

• Register file completely detached from computation

• First and Second iteration completely overlapped

cs2/08/2012

83

2/08/2012

81

**Loop Example Cycle 8** 

| atus<br>actio<br>TD<br>TD |                                  | j<br>0<br>F0<br>0<br>0<br>F0<br>0 | k<br>R1<br>F2<br>R1<br>R1<br>F2<br>R1 | 1 2 3 6 7 8           | Exec<br>Compl        |                         | Load1<br>Load2<br>Load3<br>Store1<br>Store2 | Yes<br>No<br>Yes               | Addr<br>80<br>72<br>80<br>72       | Fu  Mult1 Mult2 |
|---------------------------|----------------------------------|-----------------------------------|---------------------------------------|-----------------------|----------------------|-------------------------|---------------------------------------------|--------------------------------|------------------------------------|-----------------|
| TD<br>TD                  | F0<br>F4<br>F4<br>F0<br>F4<br>F4 | 0<br>F0<br>0<br>0<br>F0           | R1<br>F2<br>R1<br>R1<br>F2            | 1<br>2<br>3<br>6<br>7 | Comp                 | Result                  | Load1<br>Load2<br>Load3<br>Store1           | Yes<br>Yes<br>No<br>Yes        | 80<br>72<br>80                     | Mult1           |
| ΓD                        | F4<br>F4<br>F0<br>F4<br>F4       | F0<br>0<br>0<br>F0                | F2<br>R1<br>R1<br>F2                  | 2<br>3<br>6<br>7      |                      |                         | Load2<br>Load3<br>Store1                    | Yes<br>No<br>Yes               | 72<br>80                           |                 |
| ΓD                        | F4<br>F0<br>F4<br>F4             | 0<br>0<br>F0                      | R1<br>R1<br>F2                        | 3<br>6<br>7           |                      |                         | Load3<br>Store1                             | No<br>Yes                      | 80                                 |                 |
|                           | F0<br>F4<br>F4                   | 0<br>F0                           | R1<br>F2                              | 6<br>7                |                      |                         | Store 1                                     | Yes                            |                                    |                 |
|                           | F4<br>F4                         | F0                                | F2                                    | 7                     |                      |                         |                                             |                                |                                    |                 |
|                           | F4                               |                                   |                                       | 1                     |                      |                         | Store2                                      | Yes                            | 72                                 | N/1-1-0         |
| tatio                     |                                  | 0                                 | R1                                    | 8                     |                      |                         |                                             |                                | 14                                 | iviuit2         |
| tatie                     | ons:                             |                                   |                                       |                       |                      |                         | Store3                                      | No                             |                                    |                 |
|                           |                                  |                                   |                                       | S1                    | S2                   | RS                      |                                             |                                |                                    |                 |
| e <u>1</u>                | Busy                             | Op                                | Vj                                    | Vk                    | Qj                   | Qk                      | Code:                                       |                                |                                    |                 |
| 1                         | No                               |                                   |                                       |                       |                      |                         | LD                                          | F0                             | 0                                  | R1              |
| 2                         | No                               |                                   |                                       |                       |                      |                         | MULTD                                       | F4                             | F0                                 | F2              |
| 3                         | No                               |                                   |                                       |                       |                      |                         | SD                                          | F4                             | 0                                  | R1              |
| t1                        | Yes                              | Multd                             |                                       | R(F2)                 | Load1                |                         | SUBI                                        | R1                             | R1                                 | #8              |
| t2                        | Yes                              | Multd                             |                                       | R(F2)                 | Load2                |                         | BNEZ                                        | R1                             | Loop                               |                 |
| معمعا                     | ıtus                             |                                   |                                       |                       |                      |                         |                                             |                                |                                    |                 |
| i Sta                     |                                  | F0                                | F2                                    | F4                    | F6                   | F8                      | F10                                         | F12                            |                                    | F30             |
| t Sta                     |                                  |                                   |                                       |                       |                      |                         | •                                           |                                |                                    |                 |
|                           | t ste                            | t status                          | t status<br>F0                        | t status<br>F0 F2     | t status<br>F0 F2 F4 | t status<br>F0 F2 F4 F6 | t status<br>F0 F2 F4 F6 F8                  | t status<br>F0 F2 F4 F6 F8 F10 | t status<br>F0 F2 F4 F6 F8 F10 F12 | t status        |

2/08/2012 cs252-S12, Lecture07

82

| Instructi | on statu. | s:   |           |    |            | Ехес  | Write  |        |      |      |       |
|-----------|-----------|------|-----------|----|------------|-------|--------|--------|------|------|-------|
| ITER      | Instructi | on   | $\dot{j}$ | k  | Issue      | Comp  | Result |        | Busy | Addr | Fu    |
| 1         | LD        | F0   | 0         | R1 | 1          | 9     |        | Load1  | Yes  | 80   | ]     |
| 1         | MULTD     | F4   | F0        | F2 | 2          |       |        | Load2  | Yes  | 72   |       |
| 1         | SD        | F4   | 0         | R1 | 3          |       |        | Load3  | No   |      |       |
| 2         | LD        | F0   | 0         | R1 | 6          |       |        | Store1 | Yes  | 80   | Mult1 |
| 2         | MULTD     | F4   | FO        | F2 | 7          |       |        | Store2 | Yes  | 72   | Mult2 |
| 2         | SD        | F4   | 0         | R1 | 8          |       |        | Store3 | No   |      |       |
| Reservat  | ion Stat  |      |           | S1 | <i>S</i> 2 | RS    |        |        |      |      |       |
| Time      | Name      | Busy | Op        | Vj | Vk         | Qj    | Qk     | Code:  |      |      |       |
|           | Add1      | No   |           |    |            |       |        | LD     | F0   | 0    | R1    |
|           | Add2      | No   |           |    |            |       |        | MULTD  | F4   | F0   | F2    |
|           | Add3      | No   |           |    |            |       |        | SD     | F4   | 0    | R1    |
|           | Mult1     | Yes  | Multd     |    | R(F2)      | Load1 |        | SUBI   | R1   | R1   | #8    |
|           | Mult2     | Yes  | Multd     |    | R(F2)      | Load2 |        | BNEZ   | R1   | Loop |       |
| Register  | result st | atus |           |    |            |       |        |        |      |      |       |
| Clock     | R1        |      | F0        | F2 | F4         | F6    | F8     | F10    | F12  |      | F30   |
| 9         | 72        | Fu   | Load2     |    | Mult2      |       |        |        |      |      |       |

• Load1 completing: who is waiting?

• Note: Dispatching SUBI cs252-S12, Lecture07

**Loop Example Cycle 10** 

|           |           |       |       | •••              |       | <b>J.J</b> |        |         |      |      |       |
|-----------|-----------|-------|-------|------------------|-------|------------|--------|---------|------|------|-------|
| Instructi | on statu  | s:    |       |                  |       | Ехес       | Write  |         |      |      |       |
| ITER      | Instructi | on    | j     | $\boldsymbol{k}$ | Issue | Compl      | Result |         | Busy | Addr | Fu    |
| 1         | LD        | F0    | 0     | R1               | 1     | 9          | 10     | Load1   | No   |      |       |
| 1         | MULTD     | F4    | F0    | F2               | 2     |            |        | Load2   | Yes  | 72   |       |
| 1         | SD        | F4    | 0     | R1               | 3     |            |        | Load3   | No   |      |       |
| 2         | LD        | F0    | 0     | R1               | 6     | 10         |        | Store 1 | Yes  | 80   | Mult  |
| 2         | MULTD     | F4    | F0    | F2               | 7     |            |        | Store2  | Yes  | 72   | Mult2 |
| 2         | SD        | F4    | 0     | R1               | 8     |            |        | Store3  | No   |      |       |
| Reservai  | tion Stat | ions: |       |                  | S1    | S2         | RS     |         |      |      |       |
| Time      | Name      | Busy  | Op    | Vj               | Vk    | Qj         | Qk     | Code:   |      |      |       |
|           | Add1      | No    |       |                  |       |            |        | LD      | F0   | 0    | R1    |
|           | Add2      | No    |       |                  |       |            |        | MULTD   | F4   | F0   | F2    |
|           | Add3      | No    |       |                  |       |            |        | SD      | F4   | 0    | R1    |
| 4         | Mult1     | Yes   | Multd | M[80]            | R(F2) |            |        | SUBI    | R1   | R1   | #8    |
|           | Mult2     | Yes   | Multd |                  | R(F2) | Load2      |        | BNEZ    | R1   | Loop |       |
| Register  | result si | atus  |       |                  |       |            |        |         |      |      |       |
| Clock     | R1        |       | F0    | F2               | F4    | F6         | F8     | F10     | F12  |      | F30   |
| 10        | 64        | Fu    | Load2 |                  | Mult2 |            |        |         |      |      |       |

• Load2 completing: who is waiting?

• Note: Dispatching BNEZ CS252-S12, Lecture07

85

**87** 

## Loop Example Cycle 11

| Loop Example Oyele 11 |           |       |           |       |       |      |        |             |      |      |       |
|-----------------------|-----------|-------|-----------|-------|-------|------|--------|-------------|------|------|-------|
| Instructi             | on statu  | s:    |           |       |       | Ехес | Write  |             |      |      |       |
| ITER                  | Instructi | ion   | $\dot{j}$ | k     | Issue | Сотр | Result |             | Busy | Addr | Fu    |
| 1                     | LD        | F0    | 0         | R1    | 1     | 9    | 10     | Load1       | No   |      |       |
| 1                     | MULTD     | F4    | F0        | F2    | 2     |      |        | Load2       | No   |      |       |
| 1                     | SD        | F4    | 0         | R1    | 3     |      |        | Load3       | Yes  | 64   |       |
| 2                     | LD        | F0    | 0         | R1    | 6     | 10   | 11     | Store1      | Yes  | 80   | Mult1 |
| 2                     | MULTD     | F4    | F0        | F2    | 7     |      |        | Store2      | Yes  | 72   | Mult2 |
| 2                     | SD        | F4    | 0         | R1    | 8     |      |        | Store3      | No   |      |       |
| Reserva               | tion Stat | ions: |           |       | S1    | S2   | RS     |             |      |      |       |
| Time                  | Name      | Busy  | Op        | Vj    | Vk    | Qj   | Qk     | Code:       |      |      |       |
|                       | Add1      | No    |           |       |       |      |        | LD          | F0   | 0    | R1    |
|                       | Add2      | No    |           |       |       |      |        | MULTD       | F4   | F0   | F2    |
|                       | Add3      | No    |           |       |       |      |        | SD          | F4   | 0    | R1    |
| 3                     | Mult1     | Yes   | Multd     | M[80] | R(F2) |      |        | SUBI        | R1   | R1   | #8    |
| 4                     | Mult2     | Yes   | Multd     | M[72] | R(F2) |      |        | <b>BNEZ</b> | R1   | Loop |       |
| Register              | result st | atus  |           |       |       |      |        |             |      |      |       |
| Clock                 | R1        |       | F0        | F2    | F4    | F6   | F8     | F10         | F12  | •••  | F30   |
| 11                    | 64        | Fu    | Load3     |       | Mult2 |      |        |             |      |      |       |

| Register 1 | esult . | status |       |    |       |      |    |     |     |         |
|------------|---------|--------|-------|----|-------|------|----|-----|-----|---------|
| Clock      | R1      |        | F0    | F2 | F4    | F6   | F8 | F10 | F12 | <br>F30 |
| 11         | 64      | Fu     | Load3 |    | Mult2 |      |    |     |     |         |
| • Nex      | t loa   | d in   | sequ  |    | ce    | 0.14 | 07 |     |     |         |

## **Loop Example Cycle 12**

| Instructi | on statu  | s:    |       |       |       | Exec       | Write  |         |      |      |       |
|-----------|-----------|-------|-------|-------|-------|------------|--------|---------|------|------|-------|
| ITER      | Instructi | ion   | j     | k     | Issue | Comp       | Result |         | Busy | Addr | Fu    |
| 1         | LD        | F0    | 0     | R1    | 1     | 9          | 10     | Load1   | No   |      |       |
| 1         | MULTD     | F4    | F0    | F2    | 2     |            |        | Load2   | No   |      |       |
| 1         | SD        | F4    | 0     | R1    | 3     |            |        | Load3   | Yes  | 64   |       |
| 2         | LD        | F0    | 0     | R1    | 6     | 10         | 11     | Store 1 | Yes  | 80   | Mult1 |
| 2         | MULTD     | F4    | FO    | F2    | 7     |            |        | Store2  | Yes  | 72   | Mult2 |
| 2         | SD        | F4    | 0     | R1    | 8     |            |        | Store3  | No   |      |       |
| Reserva   | tion Stat | ions: |       |       | S1    | <i>S</i> 2 | RS     |         |      |      |       |
| Time      | Name      | Busy  | Op    | Vj    | Vk    | Qj         | Qk     | Code:   |      |      |       |
|           | Add1      | No    |       |       |       |            |        | LD      | F0   | 0    | R1    |
|           | Add2      | No    |       |       |       |            |        | MULTD   | F4   | F0   | F2    |
|           | Add3      | No    |       |       |       |            |        | SD      | F4   | 0    | R1    |
| 2         | Mult1     | Yes   | Multd | M[80] | R(F2) |            |        | SUBI    | R1   | R1   | #8    |
| 3         | Mult2     | Yes   | Multd | M[72] | R(F2) |            |        | BNEZ    | R1   | Loop |       |
| Register  | result st | tatus |       |       |       |            |        |         |      |      |       |
| Clock     | R1        |       | F0    | F2    | F4    | F6         | F8     | F10     | F12  |      | F30   |
| 12        | 64        | Fu    | Load3 |       | Mult2 |            |        |         |      |      |       |

• Why not issue third multiply?

cs252-S12, Lecture07

| Instructi | on statu  | s:    |       |       |       | Ехес | Write  |             |      |      |       |
|-----------|-----------|-------|-------|-------|-------|------|--------|-------------|------|------|-------|
| ITER      | Instructi | ion   | j     | k     | Issue | Comp | Result |             | Busy | Addr | Fu    |
| 1         | LD        | F0    | 0     | R1    | 1     | 9    | 10     | Load1       | No   |      |       |
| 1         | MULTD     | F4    | F0    | F2    | 2     |      |        | Load2       | No   |      |       |
| 1         | SD        | F4    | 0     | R1    | 3     |      |        | Load3       | Yes  | 64   |       |
| 2         | LD        | F0    | 0     | R1    | 6     | 10   | 11     | Store1      | Yes  | 80   | Mult1 |
| 2         | MULTD     | F4    | F0    | F2    | 7     |      |        | Store2      | Yes  | 72   | Mult2 |
| 2         | SD        | F4    | 0     | R1    | 8     |      |        | Store3      | No   |      |       |
| Reserva   | tion Stat | ions: |       |       | S1    | S2   | RS     |             |      |      |       |
| Time      | Name      | Busy  | Op    | Vj    | Vk    | Qj   | Qk     | Code:       |      |      |       |
|           | Add1      | No    |       |       |       |      |        | LD          | F0   | 0    | R1    |
|           | Add2      | No    |       |       |       |      |        | MULTD       | F4   | F0   | F2    |
|           | Add3      | No    |       |       |       |      |        | SD          | F4   | 0    | R1    |
| 1         | Mult1     | Yes   | Multd | M[80] | R(F2) |      |        | SUBI        | R1   | R1   | #8    |
| 2         | Mult2     | Yes   | Multd | M[72] | R(F2) |      |        | <b>BNEZ</b> | R1   | Loop |       |
| Register  | result st | tatus |       |       |       |      |        |             |      |      |       |
| Clock     | R1        |       | F0    | F2    | F4    | F6   | F8     | F10         | F12  |      | F30   |
| 13        | 64        | Fu    | Load3 |       | Mult2 |      |        |             |      |      |       |

2/08/2012 cs252-S12, Lecture07 8

#### **Loop Example Cycle 14**

|           | OP        |       | uiii  |       | <b>-</b> | OIC        |        |             |      |      |       |
|-----------|-----------|-------|-------|-------|----------|------------|--------|-------------|------|------|-------|
| Instructi | on statu  | s:    |       |       |          | Ехес       | Write  |             |      |      |       |
| ITER      | Instruct  | ion   | j     | k     | Issue    | Comp       | Result |             | Busy | Addr | Fu    |
| 1         | LD        | F0    | 0     | R1    | 1        | 9          | 10     | Load1       | No   |      |       |
| 1         | MULTD     | F4    | F0    | F2    | 2        | 14         |        | Load2       | No   |      |       |
| 1         | SD        | F4    | 0     | R1    | 3        |            |        | Load3       | Yes  | 64   |       |
| 2         | LD        | F0    | 0     | R1    | 6        | 10         | 11     | Store 1     | Yes  | 80   | Mult1 |
| 2         | MULTD     | F4    | F0    | F2    | 7        |            |        | Store2      | Yes  | 72   | Mult2 |
| 2         | SD        | F4    | 0     | R1    | 8        |            |        | Store3      | No   |      |       |
| Reserva   | tion Stat | ions: |       |       | S1       | <i>S</i> 2 | RS     |             |      |      |       |
| Time      | Name      | Busy  | Op    | Vj    | Vk       | Qj         | Qk     | Code:       |      |      |       |
|           | Add1      | No    |       |       |          |            |        | LD          | F0   | 0    | R1    |
|           | Add2      | No    |       |       |          |            |        | MULTD       | F4   | F0   | F2    |
|           | Add3      | No    |       |       |          |            |        | SD          | F4   | 0    | R1    |
| 0         | Mult1     | Yes   | Multd | M[80] | R(F2)    |            |        | SUBI        | R1   | R1   | #8    |
| 1         | Mult2     | Yes   | Multd | M[72] | R(F2)    |            |        | <b>BNEZ</b> | R1   | Loop |       |
| Register  | result si | tatus |       |       |          |            |        |             |      |      |       |
| Clock     | R1        |       | F0    | F2    | F4       | F6         | F8     | F10         | F12  |      | F30   |
| 14        | 64        | Fu    | Load3 |       | Mult2    |            |        |             |      |      |       |

Mult1 completing. Who is waiting?

2/08/2012 cs252-S12, Lecture07

## **Loop Example Cycle 15**

| Instructi | on statu  | s:   |       |       |       | Ехес | Write  |             |      |      |         |
|-----------|-----------|------|-------|-------|-------|------|--------|-------------|------|------|---------|
| ITER      | Instructi | on   | j     | k     | Issue | Сотр | Result |             | Busy | Addr | Fu      |
| 1         | LD        | F0   | 0     | R1    | 1     | 9    | 10     | Load1       | No   |      |         |
| 1         | MULTD     | F4   | FO    | F2    | 2     | 14   | 15     | Load2       | No   |      |         |
| 1         | SD        | F4   | 0     | R1    | 3     |      |        | Load3       | Yes  | 64   |         |
| 2         | LD        | FO   | 0     | R1    | 6     | 10   | 11     | Store1      | Yes  | 80   | [80]*R2 |
| 2         | MULTD     | F4   | FO    | F2    | 7     | 15   |        | Store2      | Yes  | 72   | Mult2   |
| 2         | SD        | F4   | 0     | R1    | 8     |      |        | Store3      | No   |      |         |
| Reservat  | ion Stat  |      |       | S1    | S2    | RS   |        |             |      |      |         |
| Time      | Name      | Busy | Op    | Vj    | Vk    | Qj   | Qk     | Code:       |      |      |         |
|           | Add1      | No   |       |       |       |      |        | LD          | FO   | 0    | R1      |
|           | Add2      | No   |       |       |       |      |        | MULTD       | F4   | F0   | F2      |
|           | Add3      | No   |       |       |       |      |        | SD          | F4   | 0    | R1      |
|           | Mult1     | No   |       |       |       |      |        | SUBI        | R1   | R1   | #8      |
| 0         | Mult2     | Yes  | Multd | M[72] | R(F2) |      |        | <b>BNEZ</b> | R1   | Loop |         |
| Register  | result st | atus |       |       |       |      |        |             |      |      |         |
| Clock     | R1        |      | F0    | F2    | F4    | F6   | F8     | F10         | F12  |      | F30     |
| 15        | 64        | Fu   | Load3 |       | Mult2 |      |        |             |      |      | ·       |

91

## • Mult2 completing. Who is waiting? cs252-S12, Lecture07

## **Loop Example Cycle 16**

|             | <b>70</b> P |       | ~L    | ,  |       | 0.0   | . •    |             |      |      |         |
|-------------|-------------|-------|-------|----|-------|-------|--------|-------------|------|------|---------|
| Instruction | on statu    | s:    |       |    |       | Ехес  | Write  |             |      |      |         |
| ITER        | Instructi   | on    | j     | k  | Issue | Comp  | Result |             | Busy | Addr | Fu      |
| 1           | LD          | F0    | 0     | R1 | 1     | 9     | 10     | Load1       | No   |      |         |
| 1           | MULTD       | F4    | F0    | F2 | 2     | 14    | 15     | Load2       | No   |      |         |
| 1           | SD          | F4    | 0     | R1 | 3     |       |        | Load3       | Yes  | 64   |         |
| 2           | LD          | F0    | 0     | R1 | 6     | 10    | 11     | Store 1     | Yes  | 80   | [80]*R2 |
| 2           | MULTD       | F4    | F0    | F2 | 7     | 15    | 16     | Store2      | Yes  | 72   | [72]*R2 |
| 2           | SD          | F4    | 0     | R1 | 8     |       |        | Store3      | No   |      |         |
| Reservat    | ion Stat    | ions: |       |    | S1    | S2    | RS     |             |      |      |         |
| Time        | Name        | Busy  | Op    | Vj | Vk    | Qj    | Qk     | Code:       |      |      |         |
|             | Add1        | No    |       |    |       |       |        | LD          | F0   | 0    | R1      |
|             | Add2        | No    |       |    |       |       |        | MULTD       | F4   | F0   | F2      |
|             | Add3        | No    |       |    |       |       |        | SD          | F4   | 0    | R1      |
|             | Mult1       | Yes   | Multd |    | R(F2) | Load3 |        | SUBI        | R1   | R1   | #8      |
|             | Mult2       | No    |       |    |       |       |        | <b>BNEZ</b> | R1   | Loop |         |
| Register    | result st   | atus  |       |    |       |       |        |             |      |      |         |
| Clock       | R1          |       | F0    | F2 | F4    | F6    | F8     | F10         | F12  |      | F30     |
| 16          | 64          | Fu    | Load3 |    | Mult1 |       |        |             |      |      |         |

2/08/2012 cs252-S12, Lecture07

92

| Instructi | on statu  | s:    |       |            |       | Ехес       | Write  |             |      |      |         |
|-----------|-----------|-------|-------|------------|-------|------------|--------|-------------|------|------|---------|
| ITER      | Instructi | on    | j     | k          | Issue | Comp       | Result |             | Busy | Addr | Fu      |
| 1         | LD        | F0    | 0     | R1         | 1     | 9          | 10     | Load1       | No   |      |         |
| 1         | MULTD     | F4    | F0    | F2         | 2     | 14         | 15     | Load2       | No   |      |         |
| 1         | SD        | F4    | 0     | R1         | 3     |            |        | Load3       | Yes  | 64   |         |
| 2         | LD        | F0    | 0     | R1         | 6     | 10         | 11     | Store1      | Yes  | 80   | [80]*R2 |
| 2         | MULTD     | F4    | F0    | F2         | 7     | 15         | 16     | Store2      | Yes  | 72   | [72]*R2 |
| 2         | SD        | F4    | 0     | R1         | 8     |            |        | Store3      | Yes  | 64   | Mult1   |
| Reservat  | tion Stat | ions: |       |            | S1    | <i>S</i> 2 | RS     |             |      |      |         |
| Time      | Name      | Busy  | Op    | Vj         | Vk    | Qj         | Qk     | Code:       |      |      |         |
|           | Add1      | No    |       |            |       |            |        | LD          | F0   | 0    | R1      |
|           | Add2      | No    |       |            |       |            |        | MULTD       | F4   | F0   | F2      |
|           | Add3      | No    |       |            |       |            |        | SD          | F4   | 0    | R1      |
|           | Mult1     | Yes   | Multd |            | R(F2) | Load3      |        | SUBI        | R1   | R1   | #8      |
|           | Mult2     | No    |       |            |       |            |        | <b>BNEZ</b> | R1   | Loop |         |
| Register  | result st | atus  |       |            |       |            |        |             |      |      |         |
| Clock     | R1        |       | F0    | <i>F</i> 2 | F4    | F6         | F8     | F10         | F12  |      | F30     |
| 17        | 64        | Fu    | Load3 |            | Mult1 | •          |        |             |      |      |         |

2/08/2012 cs252-S12, Lecture07 93 2/08/2012 cs252-S12, Lecture07 93

## **Loop Example Cycle 19**

|           |           |       | -     |    |       |            |        |             |      |      |         |
|-----------|-----------|-------|-------|----|-------|------------|--------|-------------|------|------|---------|
| Instructi | on statu  | s:    |       |    |       | Ехес       | Write  |             |      |      |         |
| ITER      | Instructi | ion   | j     | k  | Issue | Comp       | Result |             | Busy | Addr | Fu      |
| 1         | LD        | F0    | 0     | R1 | 1     | 9          | 10     | Load1       | No   |      |         |
| 1         | MULTD     | F4    | F0    | F2 | 2     | 14         | 15     | Load2       | No   |      |         |
| 1         | SD        | F4    | 0     | R1 | 3     | 18         | 19     | Load3       | Yes  | 64   |         |
| 2         | LD        | F0    | 0     | R1 | 6     | 10         | 11     | Store1      | No   |      |         |
| 2         | MULTD     | F4    | F0    | F2 | 7     | 15         | 16     | Store2      | Yes  | 72   | [72]*R2 |
| 2         | SD        | F4    | 0     | R1 | 8     | 19         |        | Store3      | Yes  | 64   | Mult1   |
| Reservat  | tion Stat | ions: |       |    | S1    | <i>S</i> 2 | RS     |             |      |      |         |
| Time      | Name      | Busy  | Op    | Vj | Vk    | Qj         | Qk     | Code:       |      |      |         |
|           | Add1      | No    |       |    |       |            |        | LD          | F0   | 0    | R1      |
|           | Add2      | No    |       |    |       |            |        | MULTD       | F4   | F0   | F2      |
|           | Add3      | No    |       |    |       |            |        | SD          | F4   | 0    | R1      |
|           | Mult1     | Yes   | Multd |    | R(F2) | Load3      |        | SUBI        | R1   | R1   | #8      |
|           | Mult2     | No    |       |    |       |            |        | <b>BNEZ</b> | R1   | Loop |         |
| Register  | result st | atus  |       |    |       |            |        |             |      |      |         |
| Clock     | R1        |       | F0    | F2 | F4    | F6         | F8     | F10         | F12  | •••  | F30     |
| 19        | 64        | Fu    | Load3 |    | Mult1 | •          | •      | •           |      |      |         |

## **Loop Example Cycle 18**

|           | JOP .                |       | 41111 |    | -     | OIO        |        |         |      |      |         |
|-----------|----------------------|-------|-------|----|-------|------------|--------|---------|------|------|---------|
| Instructi | on statu             | s:    |       |    |       | Ехес       | Write  |         |      |      |         |
| ITER      | Instructi            | ion   | j     | k  | Issue | Comp       | Result |         | Busy | Addr | Fu      |
| 1         | LD                   | F0    | 0     | R1 | 1     | 9          | 10     | Load1   | No   |      |         |
| 1         | MULTD                | F4    | F0    | F2 | 2     | 14         | 15     | Load2   | No   |      |         |
| 1         | SD                   | F4    | 0     | R1 | 3     | 18         |        | Load3   | Yes  | 64   |         |
| 2         | LD                   | F0    | 0     | R1 | 6     | 10         | 11     | Store 1 | Yes  | 80   | [80]*R2 |
| 2         | MULTD                | F4    | F0    | F2 | 7     | 15         | 16     | Store2  | Yes  | 72   | [72]*R2 |
| 2         | SD                   | F4    | 0     | R1 | 8     |            |        | Store3  | Yes  | 64   | Mult1   |
| Reservat  | eservation Stations: |       |       |    | S1    | <i>S</i> 2 | RS     |         |      |      |         |
| Time      | Name                 | Busy  | Op    | Vj | Vk    | Qj         | Qk     | Code:   |      |      |         |
|           | Add1                 | No    |       |    |       |            |        | LD      | F0   | 0    | R1      |
|           | Add2                 | No    |       |    |       |            |        | MULTD   | F4   | F0   | F2      |
|           | Add3                 | No    |       |    |       |            |        | SD      | F4   | 0    | R1      |
|           | Mult1                | Yes   | Multd |    | R(F2) | Load3      |        | SUBI    | R1   | R1   | #8      |
|           | Mult2                | No    |       |    |       |            |        | BNEZ    | R1   | Loop |         |
| Register  | result st            | tatus |       |    |       |            |        |         |      |      |         |
| Clock     | R1                   |       | F0    | F2 | F4    | F6         | F8     | F10     | F12  |      | F30     |
| 18        | 64                   | Fu    | Load3 |    | Mult1 |            |        |         |      |      |         |

## **Loop Example Cycle 20**

| Instruction status:    |       |      |       |    |       | Ехес  | Write  |             |      |      |       |
|------------------------|-------|------|-------|----|-------|-------|--------|-------------|------|------|-------|
| ITER Instruction       |       |      | j     | k  | Issue | Compl | Result |             | Busy | Addr | Fu    |
| 1                      | LD    | F0   | 0     | R1 | 1     | 9     | 10     | Load1       | No   |      |       |
| 1                      | MULTD | F4   | F0    | F2 | 2     | 14    | 15     | Load2       | No   |      |       |
| 1                      | SD    | F4   | 0     | R1 | 3     | 18    | 19     | Load3       | Yes  | 64   |       |
| 2                      | LD    | F0   | 0     | R1 | 6     | 10    | 11     | Store 1     | No   |      |       |
| 2                      | MULTD | F4   | F0    | F2 | 7     | 15    | 16     | Store2      | No   |      |       |
| 2                      | SD    | F4   | 0     | R1 | 8     | 19    | 20     | Store3      | Yes  | 64   | Mult1 |
| Reservation Stations:  |       |      |       |    | S1    | S2    | RS     |             |      |      |       |
| Time                   | Name  | Busy | Op    | Vj | Vk    | Qj    | Qk     | Code:       |      |      |       |
|                        | Add1  | No   |       |    |       |       |        | LD          | F0   | 0    | R1    |
|                        | Add2  | No   |       |    |       |       |        | MULTD       | F4   | F0   | F2    |
|                        | Add3  | No   |       |    |       |       |        | SD          | F4   | 0    | R1    |
|                        | Mult1 | Yes  | Multd |    | R(F2) | Load3 |        | SUBI        | R1   | R1   | #8    |
|                        | Mult2 | No   |       |    |       |       |        | <b>BNEZ</b> | R1   | Loop |       |
| Register result status |       |      |       |    |       |       |        |             |      |      |       |
| Clock                  | R1    |      | F0    | F2 | F4    | F6    | F8     | F10         | F12  | •••  | F30   |
| 20                     | 64    | Fu   | Load3 |    | Mult1 |       |        |             |      |      |       |

2/08/2012 cs252-S12, Lecture07 95 2/08/2012 cs252-S12, Lecture07 96

# Why can Tomasulo overlap iterations of loops?

- Register renaming
  - Multiple iterations use different physical destinations for registers (dynamic loop unrolling).
- Reservation stations

2/08/2012

- Permit instruction issue to advance past integer control flow operations

cs252-S12, Lecture07

- Other idea: Tomasulo building dynamic "DataFlow" graph from instructions
  - Fits in with readings for Wednesday

**97** 

#### **Summary**

- Scoreboard: Track dependencies through reservations
  - Simple scheme for out-of-order execution
  - WAW and WAR hazards force stalls cannot handle multiple instructions with same destination register
- Reservations stations: renaming to larger set of registers + buffering source operands
  - Prevents registers as bottleneck
  - Avoids WAR, WAW hazards of Scoreboard
  - Allows loop unrolling in HW
- Dynamic hardware schemes can unroll loops dynamically in hardware
  - Form of limited dataflow
  - Register renaming is essential
- Lasting Contributions of Tomasulo Algorithm
  - Dynamic scheduling
  - Register renaming
  - Load/store disambiguation
- 360/91 descendants are Pentium II; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264

2/08/2012 cs252-S12, Lecture07 98