### Reduction of Data Hazards Stalls with Dynamic Scheduling

i.e Current pipeline: In-order Single issue with FP support

- So far we have dealt with <u>data hazards</u> in instruction pipelines by:
  - Result forwarding (register bypassing) to reduce or eliminate stalls needed to prevent RAW hazards as a result of true data dependence.
  - Hazard detection hardware to stall the pipeline starting with the instruction that uses the result.
     i.e forward + stall (if needed)
  - Compiler-based static pipeline scheduling to separate the dependent instructions minimizing actual hazard-prevention stalls in scheduled code.
    - Loop unrolling to increase basic block size: More ILP exposed.

i.e Start of instruction execution is not in program order

- **Dynamic scheduling:** (out-of-order execution)
  - Uses a hardware-based mechanism to <u>reorder</u> or <u>rearrange</u> instruction <u>execution order</u> to <u>reduce stalls</u> dynamically at runtime.
    - Better dynamic exploitation of instruction-level parallelism (ILP).

Why?

- Enables handling some cases where instruction dependencies are unknown at compile time (ambiguous dependencies).
- Similar to the other pipeline optimizations above, a dynamically scheduled processor <u>cannot remove true data dependencies</u>, but tries to avoid or reduce stalling.

Fourth Edition: Appendix A.7, Chapter 2.4, 2.5

(Third Edition: Appendix A.8, Chapter 3.2, 3.3)

#### Dynamic Pipeline Scheduling: The Concept

(Out-of-order execution)

i.e Start of instruction execution is not in program order

**Program Order** 

- Dynamic pipeline scheduling overcomes the limitations of in-order pipelined execution by allowing out-of-order instruction execution.
- <u>Instruction are allowed to start executing out-of-order as soon as</u> their operands are available.
  - Better dynamic exploitation of instruction-level parallelism (ILP).

True Data Dependency

#### **Example:**

In the case of in-order pipelined execution SUB.D must wait for DIV.D to complete which stalled ADD.D before starting execution In out-of-order execution SUBD can start as soon as the values of its operands F8, F14 are available.

1 DIV.D F0, F2, F4
2 ADD.D F10, F0, F8
3 SUB.D F12, F8, F14

Does not depend on DIV.D or ADD.D

Dependency Graph

- This implies allowing out-of-order instruction commit (completion).
- May lead to imprecise exceptions if an instruction issued earlier raises an exception.
  - This is similar to pipelines with multi-cycle floating point units.

In Fourth Edition: Appendix A.7, Chapter 2.4 (In Third Edition: Appendix A.8, Chapter 3.2)

EECC551 - Shaaban

**Order = Program Instruction Order** 

#### **Dynamic Pipeline Scheduling**

- Dynamic instruction scheduling is accomplished by:
  - Dividing the Instruction Decode ID stage into two stages:

Always done in program order

order

FYI:

- **Issue:** Decode instructions, check for <u>structural hazards</u>.
  - + A record of data dependencies is constructed as instructions are issued
    - This creates a dynamically-constructed dependency graph for the window of instructions in-flight (being processed) in the CPU.

Can be done out of program

- Read operands: Wait until data hazard conditions, if any, are resolved, then read operands when available (then start execution)
- (All instructions pass through the <u>issue stage in order</u> but can be stalled or pass each other in the read operands stage).
- In the instruction fetch stage IF, fetch an additional instruction every cycle into a latch or several instructions into an instruction queue.
- Increase the number of functional units to meet the demands of the additional instructions in their EX stage.
- •\_\_Two approaches to dynamic scheduling:

(Control Data Corp.)

- Dynamic scheduling with the <u>Scoreboard</u> used first in CDC6600 (1963)
- 2 The Tomasulo approach pioneered by the IBM 360/91 (1966)

Fourth Edition: Appendix A.7, Chapter 2.4 (Third Edition: Appendix A.8, Chapter 3.2)

CDC660 is the world's first

"Supercomputer" Cost: \$7 million in 1963

#### **Dynamic Scheduling With A Scoreboard**

- The scoreboard is <u>a centralized hardware mechanism</u> that maintains an execution rate of one instruction per cycle by executing an instruction as soon as its operands <u>are available in registers</u> and no hazard conditions prevent it.
  - e.g. Forming a single-issue out-of-order pipeline

**EX Includes MEM** 

It replaces ID, EX, WB with four stages: ID1, ID2, EX, WB

No changes to Instruction Fetch (IF)

cludes MEM Issue Read Operands

Every instruction goes through the scoreboard where <u>a record of data</u> <u>dependencies is constructed</u> (corresponds to instruction issue).

In ID1 (Issue)

- In effect <u>dynamically</u> constructing the <u>dependency graph</u> by hardware for <u>a window of instructions</u> as they are issued one at a time <u>in program order</u>.
- A system with a scoreboard is assumed to have several functional units with their status information reported to the scoreboard.
- If the scoreboard determines that an instruction cannot execute immediately it executes another waiting instruction and keeps monitoring hardware units status and decide when the instruction can proceed to execute.
- The scoreboard also decides when an instruction can write its results to registers (hazard detection and resolution is centralized in the scoreboard).

**Instruction Fetch (IF) is not changed** 

**Order = Program Instruction Order** 

In Fourth Edition: Appendix A.7 (In Third Edition: Appendix A.8)

Introduced in CDC6600 (1963)

EECC551 - Shaaban

#4 lec # 4 Spring 2013 3-18-2013



#### Instruction Execution Stages with A Scoreboard

**Issue** (ID1): An instruction is issued if:

Stage 0 Instruction Fetch (IF): No changes, in-order

**Always** done in program order

- A functional unit for the instruction is available (No structural hazard).
- The instruction result destination register is not marked for writing by an earlier active instruction (No WAW hazard, i.e no output dependence)

Can be done out of program order

- If the above conditions are satisfied, the scoreboard issues the instruction to a functional unit and updates its internal data structures. As indicated by instruction issue requirements, structural and WAW hazards are resolved here by stalling the instruction issue. (this stage replaces part of ID stage in the conventional MIPS pipeline).
- **Read operands (ID2):** The scoreboard monitors the availability of the source operands. A source operand is available when no earlier active instruction will write it. When all source operands are available the scoreboard tells the functional unit to read all operands from the registers at once (no forwarding supported) and start execution (RAW hazards resolved here dynamically). This completes ID.

From registers (No forwarding)

- **Execution** (EX): The functional unit starts execution upon receiving operands. When the results are ready it notifies the scoreboard (replaces EX, MEM in MIPS).
- Write result (WB): Once the scoreboard senses that a functional unit completed execution, it checks for WAR hazards and stalls the completing instruction if needed otherwise the write back is completed. The functional unit issued to the instruction is marked as available (not busy) after WB is completed. EECC551 - Shaaban

In Fourth Edition: Appendix A.7 (In Third Edition: Appendix A.8)

Stage 0: Fetch, no changes, in-order

#6 lec # 4 Spring 2013 3-18-2013

#### Three Parts of the Scoreboard

- 1 **Instruction status:** Which of 4 steps the instruction is in.
- **Functional unit status:** Indicates the state of the functional unit (FU). Nine fields for each functional unit:
  - Busy Indicates whether the unit is busy or not
  - **Operation to perform in the unit (e.g., + or −)**
  - Fi Destination register
  - Fj, Fk Source-register numbers i.e. Operand Registers
  - Qj, Qk Functional units producing source (operand) registers Fj, Fk
  - Rj, Rk Flags indicating when Fj, Fk are ready Yes or = 1 means ready

(set to Yes after operand is available to read both operands read at once from registers)

i.e when both Rj, Rk are set to yes (both operands are ready)

Register result status: Indicates which functional unit will write to each register, if one exists. Blank when no pending instructions will write that register.

Needed to check for possible WAW hazard and stall issue

F0 F1 F2 F3 ..... F31
Add1 -- Mult1 -- .... --

# The Scoreboard: Detailed Pipeline Control

| Instruction status | Wait until                                                                    | Bookkeeping                                                                                                                                                  |
|--------------------|-------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Issue              | Not busy (FU)<br>and not result(D)                                            | Busy(FU)← yes; Op(FU)← op;<br>Fi(FU)← `D'; Fj(FU)← `S1';<br>Fk(FU)← `S2'; Qj← Result('S1');<br>Qk← Result(`S2'); Rj← not Qj;<br>Rk← not Qk; Result('D')← FU; |
| Read operands      | Rj and Rk                                                                     | Rj← Yes Rk← Yes                                                                                                                                              |
| Execution complete | Functional unit done                                                          |                                                                                                                                                              |
| Write<br>result    | ∀f((Fj( f )≠Fi(FU)<br>or Rj( f )=No) &<br>(Fk( f ) ≠Fi(FU) or<br>Rk( f )=No)) | ∀f(if Qj(f)=FU then Rj(f)← Yes);<br>∀f(if Qk(f)=FU then Rj(f)← Yes);<br>Result(Fi(FU))← 0; Busy(FU)← No                                                      |

DAP Spr. '98 @UCB 30

In Fourth Edition: Appendix A.7 (In Third Edition: Appendix A.8)

#### A Scoreboard Example

The following code is run on the MIPS with a scoreboard given earlier with:

| # of FUs | EX cycles         |
|----------|-------------------|
| 1        | 1                 |
| 2        | 10                |
| 1        | 2                 |
| 1        | 40                |
|          | # of FUs  1 2 1 1 |

- 1 L.D
- 2 L.D
- 3 MUL.D
- 4 SUB.D
- 5 DIV.D
- 6 ADD.D

- F6, 34(R2)
- F2,45(R3)
- F0, F2, F4
- F8, F6, F2
  - F10, F0, F6
  - F6, F8, F2

All functional units are not pipelined

(similar to CDC6600)

Real Data Dependence (RAW)

Anti-dependence (WAR) +

**Output Dependence** (WAW)

In Fourth Edition: Appendix A.7 (In Third Edition: Appendix A.8)



Issue

FP EX Cycles: Add = 2 cycles, Multiply = 10, Divide = 40

Instruction status Read ExecutionWrite Instruction Issue operandscompleteResult k Means at end of Cycle 1 F6 34+ R2 L.D 1 F2 45+ R3 L.D MUL.DF0 F4 F2 SUB.DF8 F2 DIV.D F10 FO F6 F2 ADD.DF6 F8 S1 Functional unit status S2 FU for i FU for k Fi? Fk? dest Time Name Оp Fi Rk Busy Qk Yes Load F6 **R2** Yes → Integer Mult1 No Mult2 No Add No Divide No Register result status *F0 F*2 F4 F6 F8 F10 F12 F30 Clock FU Integer

FP EX Cycles: Add = 2 cycles, Multiply = 10, Divide = 40

```
Read
                                            ExecutionWrite
       Instruction status
                          k
                              Issue operandscompleteResult
        Instruction
       I D
              F6
                   34+
                        R2
                                      2
Issue ?
       L.D
              F2
                   45+
                        R3
       MUL.DF0
                   F2
                        F4
       SUB.DF8
                   F6
                        F2
       DIV.D F10
                        F6
                  FO
                        F2
       ADD.DF6
                   F8
                                                    S1
                                                               FU for j FU for k Fi?
       Functional unit status
                                            dest
                                                         S2
                                                                                      Fk?
                                            Fi
                                                                                      Rk
              Time Name
                              Busy
                                    Op
                                                         Fk
                                                                       Qk
                              Yes
                                    Load
                                            F6
                                                          R2
                                                                                      Yes
                   Integer
                              No
                   Mult1
                   Mult2
                              No
                   Add
                              No
                   Divide
                              No
       Register result status
                              F0
                                     F2
                                            F4
                                                    F6 F8 F10 F12
                                                                                      F30
        Clock
                        FU
                                                   Integer
```

Issue second L.D? No, stall on structural hazard. Single integer functional unit is busy.
 EECC551 - Shaaban



 Issue MUL.D? No, cannot issue out of order (second L.D not issued yet)



Issue second L.D?



Issue second L.D

| Instruction status     |       | Read    | Execution | o <i>l</i> Write |    |          |          |     |     |
|------------------------|-------|---------|-----------|------------------|----|----------|----------|-----|-----|
| Instruction j k        | Issue | operand | lscomplet | eResu            | ļt |          |          |     |     |
| L.D F6 34+ R2          | 1     | 2       | 3         | 4                |    |          |          |     |     |
| L.D F2 45+ R3          | 5     | 6       |           |                  |    |          |          |     |     |
| MUL.D F0 F2 F4         | 6     |         |           |                  |    |          |          |     |     |
| SUB.D F8 F6 F2         |       |         |           |                  |    |          |          |     |     |
| DIV.D F10 F0 F6        |       |         |           |                  |    |          |          |     |     |
| ADD.D F6 F8 F2         |       |         |           |                  |    |          |          |     |     |
| Functional unit status |       |         | dest      | S1               | S2 | FU for j | FU for k | Fj? | Fk? |
| Time Name              | Busy  | Ор      | Fi        | Fj               | Fk | Qj       | Qk       | Rj  | Rk  |
| Integer                | Yes   | Load    | F2        |                  | R3 |          |          |     | Yes |
| → Mult1                | Yes   | Mult    | F0        | F2               | F4 | Integer  |          | No  | Yes |
| Mult2                  | No    |         |           |                  |    |          |          |     |     |
| Add                    | No    |         |           |                  |    |          |          |     |     |
| Divide                 | No    |         |           |                  |    |          |          |     |     |
| Register result status |       |         |           |                  |    |          |          |     | _   |
| Clock                  | F0    | F2      | F4        | F6               | F8 | F10      | F12      |     | F30 |
| <b>6</b> FU            | Mult1 | Integer |           |                  |    |          |          |     |     |
|                        |       |         |           |                  |    |          |          |     |     |

**Issue MUL.D** 

Issue

| Instruction s       | status   |             |       | Read       | Execution | o <i>l</i> Write | <del>)</del> |          |          |     |     |
|---------------------|----------|-------------|-------|------------|-----------|------------------|--------------|----------|----------|-----|-----|
| Instruction         | j        | k           | Issue | operand    | dscomplet | eResu            | <u>ıl</u> t  |          |          |     |     |
| L.D F6              | 34+      | R2          | 1     | 2          | 3         | 4                |              |          |          |     |     |
| L.D F2              | 45+      | R3          | 5     | 6          | 7         |                  |              |          |          |     |     |
| MUL.D F0            | F2       | F4          | 6     | ?          |           |                  |              |          |          |     |     |
| SUB.D F8            | F6       | F2          | 7     |            |           |                  |              |          |          |     |     |
| DIV.D F10           | F0       | F6          |       |            |           |                  |              |          |          |     |     |
| ADD.D F6            | F8       | F2          |       |            |           |                  |              |          |          |     |     |
| <u>Functional ι</u> | unit sta | <u>itus</u> |       |            | dest      | S1               | S2           | FU for j | FU for k | Fj? | Fk? |
| Time                | . Nam    | е           | Busy  | Ор         | Fi        | Fj               | Fk           | Qj       | Qk       | Rj  | Rk  |
|                     | Integ    | er          | Yes   | Load       | F2        |                  | R3           | -        |          |     | Yes |
|                     | Mult     | 1           | Yes   | Mult       | F0        | F2               | F4           | Integer  |          | No  | Yes |
|                     | Mult2    | 2           | No    |            |           |                  |              |          |          |     |     |
| <b>→</b>            | Add      |             | Yes   | Sub        | F8        | F6               | F2           |          | Integer  | Yes | No  |
|                     | Divid    | le          | No    |            |           |                  |              |          | _        |     |     |
| Register res        | sult sta | <u>itus</u> |       |            |           |                  |              |          |          |     |     |
| Clock               |          |             | F0    | <i>F</i> 2 | F4        | F6               | F8           | F10      | F12      |     | F30 |
| 7                   |          | FU          | Mult1 | Integer    |           |                  | Add          |          |          |     |     |
|                     |          |             |       |            |           |                  |              |          |          |     |     |

Issue SUB.D

Issue

Read multiply operands?

### Scoreboard Example: Cycle 8a (First half of cycle 8)

| Instruction statu | <u>s</u> |       | Read    | Execution | o <i>Write</i> |     |          |          |     |     |
|-------------------|----------|-------|---------|-----------|----------------|-----|----------|----------|-----|-----|
| Instruction j     | k        | Issue | operand | dscomplet | eResu          | lt  |          |          |     |     |
| L.D F6 34         | + R2     | 1     | 2       | 3         | 4              |     |          |          |     |     |
| L.D F2 45         | + R3     | 5     | 6       | 7         |                |     |          |          |     |     |
| MUL.D F0 F2       | F4       | 6     |         |           |                |     |          |          |     |     |
| SUB.D F8 F6       | F2       | 7     |         |           |                |     |          |          |     |     |
| DIV.D F10 F0      | F6       | 8     |         |           |                |     |          |          |     |     |
| ADD.D F6 F8       | F2       |       |         |           |                |     |          |          |     |     |
| Functional unit s | status   |       |         | dest      | S1             | S2  | FU for j | FU for k | Fj? | Fk? |
| Time Na           | me       | Busy  | Ор      | Fi        | Fj             | Fk  | Qj       | Qk       | Rj  | Rk  |
| Inte              | eger     | Yes   | Load    | F2        |                | R3  |          |          | -   | Yes |
| Mυ                | ılt1     | Yes   | Mult    | F0        | F2             | F4  | Integer  |          | No  | Yes |
| Mυ                | ılt2     | No    |         |           |                |     |          |          |     |     |
| Ad                | d        | Yes   | Sub     | F8        | F6             | F2  |          | Integer  | Yes | No  |
| → Div             | ⁄ide     | Yes   | Div     | F10       | F0             | F6  | Mult1    |          | No  | Yes |
| Register result s | status   |       |         |           |                |     |          |          |     |     |
| Clock             |          | F0    | F2      | F4        | F6             | F8  | F10      | F12      |     | F30 |
| 8                 | FU       | Mult1 | Integer |           |                | Add | Divide   |          |     |     |
|                   |          |       |         |           |                |     |          |          |     |     |

Issue DIV.D

Issue

### Scoreboard Example: Cycle 8b (Second half of cycle 8)

|                                  |        | <b>.</b> . | _         |                   |            |          |          | End   | of Cyclo Q |
|----------------------------------|--------|------------|-----------|-------------------|------------|----------|----------|-------|------------|
| Instruction status               |        | Read       | Execut    | io <i>l</i> Write | 1          |          |          | Ena   | of Cycle 8 |
| Instruction j k                  | Issue  | operar     | ndscomple | eteResu           | <u>l</u> t |          |          |       |            |
| L.D F6 34+ R2                    | 1      | 2          | 3         | 4                 |            |          |          |       |            |
| L.D F2 45+ R3                    | 5      | 6          | 7         | 8                 |            |          |          |       |            |
| MUL.D F0 F2 F4                   | 6      |            |           |                   |            |          |          |       |            |
| SUB.D F8 F6 F2                   | 7      |            |           |                   |            |          |          |       |            |
| DIV.D F10 F0 F6                  | 8      |            |           |                   |            |          |          |       |            |
| ADD.D F6 F8 F2                   |        |            |           |                   |            |          |          |       |            |
| Functional unit status           |        |            | dest      | S1                | S2         | FU for j | FU for F | k Fj? | Fk?        |
| Time Name                        | Busy   | Ор         | Fi        | Fj                | Fk         | Qj       | Qk       | Rj    | Rk         |
| Integer                          | No     |            |           |                   |            |          |          |       |            |
| Mult1                            | Yes    | Mult       | F0        | F2                | F4         |          |          | Yes   | Yes        |
| Mult2                            | No     |            |           |                   |            |          |          |       |            |
| Add                              | Yes    | Sub        | F8        | F6                | F2         |          |          | Yes   | Yes        |
| Divide                           | Yes    | Div        | F10       | F0                | F6         | Mult1    |          | No    | Yes        |
| Register result status           |        | 1          |           |                   |            |          |          |       |            |
| Clock                            | F0     | <i>F</i> 2 | F4        | F6                | F8         | F10      | F12      |       | F30        |
| <b>8</b> FU                      | Mult1  |            |           |                   | Add        | Divide   |          |       |            |
|                                  |        | -          |           |                   |            |          |          |       |            |
| <ul> <li>Second L.D v</li> </ul> | vrites | s res      | ult to    | F2                |            |          |          |       |            |

#19 lec # 4 Spring 2013 3-18-2013

FP EX Cycles: Add = 2 cycles, Multiply = 10, Divide = 40

|      | Instructure Instructure L.D L.D MUL.D SUB.D DIV.D | F6<br>F2<br>F0<br>F8 | j<br>34+<br>45+<br>F2<br>F6 | k<br>R2<br>R3<br>F4<br>F2 | Issue 1 5 6 7 | Read<br>operand<br>2<br>6<br>9<br>9 | Execution   Second   Second |    |     |          |          |     |     |
|------|---------------------------------------------------|----------------------|-----------------------------|---------------------------|---------------|-------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|-----|----------|----------|-----|-----|
| Issu | e?                                                |                      | F8                          | F2                        | ?             |                                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    |     |          |          |     |     |
|      | <u>Funct</u>                                      | <u>ional ι</u>       | ınit sta                    | <u>ıtus</u>               |               |                                     | dest                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | S1 | S2  | FU for j | FU for k | Fj? | Fk? |
| E    | cution                                            | ן Time               | Nam                         | е                         | Busy          | Ор                                  | Fi                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | Fj | Fk  | Qj       | Qk       | Rj  | Rk  |
| cyc  |                                                   |                      | Integ                       | er                        | No            |                                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    |     |          |          |     |     |
| 1 -  | maining                                           | <u> </u> 10          | ) Mult                      | 1                         | Yes           | Mult                                | F0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | F2 | F4  |          |          | Yes | Yes |
| l `  | ecution                                           | $\setminus$          | Mult2                       | 2                         | No            |                                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    |     |          |          |     |     |
|      | ually starts                                      | \ 2                  | Add                         |                           | Yes           | Sub                                 | F8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | F6 | F2  |          |          | Yes | Yes |
| nex  | t cycle)                                          | _                    | Divid                       | le                        | Yes           | Div                                 | F10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | F0 | F6  | Mult1    |          | No  | Yes |
|      | Regis                                             | ter res              | sult sta                    | tus                       |               |                                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    |     |          |          |     |     |
|      | Cloc                                              | k                    |                             |                           | F0            | <i>F</i> 2                          | F4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | F6 | F8  | F10      | F12      |     | F30 |
|      | 9                                                 |                      |                             | FU                        | Mult1         |                                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    | Add | Divide   |          |     |     |
|      |                                                   |                      |                             |                           |               |                                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    |     |          |          |     |     |

Read operands for MUL.D & SUB.D

Ex starts next cycle for both instructions

Issue ADD.D?



? • Issue ADD.D?



- ? Read operands for DIV.D?
- ? Issue ADD.D?



• Issue ADD.D, Add FP unit available at end of cycle 12 (start of 13)

| <u>Instruc</u>  | tion s | <u>tatus</u> |            |       | Read    | Execution | o <i>MVrite</i> |            |          |            |       |     |
|-----------------|--------|--------------|------------|-------|---------|-----------|-----------------|------------|----------|------------|-------|-----|
| Instruc         | tion   | j            | k          | Issue | operand | dscomplet | eResu           | <u>l</u> t |          |            |       |     |
| L.D             | F6     | 34+          | R2         | 1     | 2       | 3         | 4               |            |          |            |       |     |
| L.D             | F2     | 45+          | R3         | 5     | 6       | 7         | 8               |            |          |            |       |     |
| MUL.D           | F0     | F2           | F4         | 6     | 9       |           |                 |            |          |            |       |     |
| SUB.D           | F8     | F6           | F2         | 7     | 9       | 11        | 12              |            | Write re | sult of AI | DD.D? |     |
| DIV.D           | F10    | F0           | F6         | 8     |         |           |                 |            | No WAl   | R hazard   |       |     |
| ADD.D           | F6     | F8           | F2         | 13    | 14      | 16        |                 |            |          |            |       |     |
| <u>Function</u> | onal u | nit sta      | <u>tus</u> |       |         | dest      | S1              | S2         | FU for j | FU for k   | Fj?   | Fk? |
|                 | Time   | Nam          | е          | Busy  | Ор      | Fi        | Fj              | Fk         | Qj       | Qk         | Rj    | Rk  |
|                 |        | Integ        | er         | No    |         |           |                 |            |          |            |       |     |
|                 | 2      | Mult1        |            | Yes   | Mult    | F0        | F2              | F4         |          |            | Yes   | Yes |
|                 |        | Mult2        | <u>)</u>   | No    |         |           |                 |            |          |            |       |     |
|                 |        | Add          |            | Yes   | Add     | F6        | F8              | F2         |          |            | Yes   | Yes |
|                 |        | Divid        | е          | Yes   | Div     | F10       | F0              | F6         | Mult1    |            | No    | Yes |
| Regist          | er res | ult sta      | <u>tus</u> |       |         |           |                 |            |          |            |       |     |
| Cloc            | k      |              |            | F0    | F2      | F4        | <i>F</i> 6      | F8         | F10      | F12        |       | F30 |
| 17              |        |              | FU         | Mult1 |         |           | Add             |            | Divide   |            |       |     |

 Write result of ADD.D? No, WAR hazard (DIV.D did not read any operands including F6)

| Instruction s       | <u>status</u> |             |       | Read   | Executi  | io <i>l</i> Write | ı          |          |          |     |     |
|---------------------|---------------|-------------|-------|--------|----------|-------------------|------------|----------|----------|-----|-----|
| Instruction         | j             | k           | Issue | operan | dscomple | teResu            | <u>l</u> t |          |          |     |     |
| L.D F6              | 34+           | R2          | 1     | 2      | 3        | 4                 |            |          |          |     |     |
| L.D F2              | 45+           | R3          | 5     | 6      | 7        | 8                 |            |          |          |     |     |
| MUL.D F0            | F2            | F4          | 6     | 9      | 19       | 20                |            |          |          |     |     |
| SUB.D F8            | F6            | F2          | 7     | 9      | 11       | 12                |            |          |          |     |     |
| DIV.D F10           | F0            | F6          | 8     | ?      |          |                   |            |          |          |     |     |
| ADD.D F6            | F8            | F2          | 13    | 14     | 16       | ?                 |            |          |          |     |     |
| <u>Functional υ</u> | ınit sta      | <u>ıtus</u> |       |        | dest     | S1                | S2         | FU for j | FU for k | Fj? | Fk? |
| Time                | Nam           | е           | Busy  | Ор     | Fi       | Fj                | Fk         | Qj       | Qk       | Rj  | Rk  |
|                     | Integ         | er          | No    |        |          |                   |            |          |          |     |     |
|                     | Mult'         | 1           | No    |        |          |                   |            |          |          |     |     |
|                     | Mult2         | 2           | No    |        |          |                   |            |          |          |     |     |
|                     | Add           |             | Yes   | Add    | F6       | F8                | F2         |          |          | Yes | Yes |
|                     | Divid         | le          | Yes   | Div    | F10      | F0                | F6         |          |          | Yes | Yes |
| Register res        | ult sta       | <u>tus</u>  |       |        |          |                   |            |          |          |     |     |
| Clock               |               |             | F0    | F2     | F4       | F6                | F8         | F10      | F12      |     | F30 |
| 20                  |               | FU          |       |        |          | Add               |            | Divide   |          |     |     |
|                     |               |             |       |        |          |                   |            |          |          |     |     |

Read operands for DIV.D?

<sup>? •</sup> Write result of ADD.D?

|     | <u>Instruc</u>                   | ction s         | tatus    |             |       | Read    | Execut   | io <i>l</i> Write |    |          |          |     |     |
|-----|----------------------------------|-----------------|----------|-------------|-------|---------|----------|-------------------|----|----------|----------|-----|-----|
|     | Instruc                          | ction           | j        | k           | Issue | operand | dscomple | teResu            | lt |          |          |     |     |
|     | L.D                              | F6              | 34+      | R2          | 1     | 2       | 3        | 4                 |    |          |          |     |     |
|     | L.D                              | F2              | 45+      | R3          | 5     | 6       | 7        | 8                 |    |          |          |     |     |
|     | MUL.D                            | F0              | F2       | F4          | 6     | 9       | 19       | 20                |    |          |          |     |     |
|     | SUB.D                            | F8              | F6       | F2          | 7     | 9       | 11       | 12                |    |          |          |     |     |
|     | DIV.D                            | F10             | F0       | F6          | 8     | 21      |          |                   |    |          |          |     |     |
|     | ADD.D                            | F6              | F8       | F2          | 13    | 14      | 16       | ?                 |    |          |          |     |     |
|     | <u>Functi</u>                    | onal u          | ınit sta | <u>ıtus</u> |       |         | dest     | S1                | S2 | FU for j | FU for k | Fj? | Fk? |
| Eve | Functional unit status Time Name |                 |          |             | Busy  | Ор      | Fi       | Fj                | Fk | Qj       | Qk       | Rj  | Rk  |
| cyc |                                  |                 | Integ    | er          | No    |         |          |                   |    |          |          |     |     |
| 1 - | naining                          |                 | Mult1    | 1           | No    |         |          |                   |    |          |          |     |     |
| 1 ` | ecution                          |                 | Mult2    | 2           | No    |         |          |                   |    |          |          |     |     |
|     | ually starts                     |                 | Add      |             | Yes   | Add     | F6       | F8                | F2 |          |          | Yes | Yes |
| ПСА |                                  | ycle) 40 Divide |          |             | Yes   | Div     | F10      | F0                | F6 |          |          | Yes | Yes |
|     | Register result status           |                 |          |             |       |         | ·        |                   |    |          |          |     |     |
|     | Clock                            |                 |          |             | F0    | F2      | F4       | F6                | F8 | F10      | F12      |     | F30 |
|     | 21                               |                 |          | FU          |       |         |          | Add               |    | Divide   |          |     |     |

- DIV.D reads operands, starts execution next cycle
- ? Write result of ADD.D?

| Instruct | tion st | tatus   |          |       | Read     | Execution | o <i>Mrite</i> |            |          |                  |     |     |
|----------|---------|---------|----------|-------|----------|-----------|----------------|------------|----------|------------------|-----|-----|
| Instruct | tion    | j       | k        | Issue | operand  | lscomplet | eResu          | <u>I</u> t |          |                  |     |     |
| L.D      | F6      | 34+     | R2       | 1     | 2        | 3         | 4              |            |          |                  |     |     |
| L.D      | F2      | 45+     | R3       | 5     | 6        | 7         | 8              |            |          |                  |     |     |
| MUL.D    | F0      | F2      | F4       | 6     | 9        | 19        | 20             |            |          |                  |     |     |
| SUB.D    | F8      | F6      | F2       | 7     | 9        | 11        | 12             |            |          |                  |     |     |
| DIV.D    | F10     | F0      | F6       | 8     | 21       |           |                |            |          |                  |     |     |
| ADD.D    | F6      | F8      | F2       | 13    | 14       | 16        | 22             |            |          |                  |     |     |
| Functio  | nal u   | nit sta | tus      |       |          | dest      | S1             | S2         | FU for j | FU for k         | Fj? | Fk? |
|          | Time    | Nam     | е        | Busy  | Ор       | Fi        | Fj             | Fk         | Qj       | Qk               | Rj  | Rk  |
|          |         | Integ   | er       | No    | •        |           | •              |            |          |                  | •   |     |
|          |         | Mult1   |          | No    |          |           |                |            |          |                  |     |     |
|          |         | Mult2   | <u>)</u> | No    |          |           |                |            |          |                  |     |     |
|          |         | Add     |          | No    |          |           |                |            |          |                  |     |     |
|          | 39      | Divid   | е        | Yes   | Div      | F10       | F0             | F6         |          |                  | Yes | Yes |
| Registe  | er resi | ult sta | tus      |       |          | ſ         |                |            |          |                  |     |     |
| Clock    |         |         |          | F0    | F2       | F4        | F6             | F8         | F10      | F12              |     | F30 |
| 22       | •       |         | FU       |       | <u> </u> | · ·       |                |            | Divide   | - · <del>-</del> |     |     |
| <b></b>  |         |         | , 0      |       |          |           |                |            | DIVIGO   |                  |     |     |
|          |         |         |          |       |          |           |                |            |          |                  |     |     |

First cycle DIV.D execution (39 more ex cycles)
ADD.D writes result in F6 (No WAR, DIV.D read operands in cycle 21)

|    | <u>Instruc</u> | ction s                | tatus    |             |       | Read   | Executi  | o <i>l</i> Write |            |          |          |     |     |
|----|----------------|------------------------|----------|-------------|-------|--------|----------|------------------|------------|----------|----------|-----|-----|
|    | Instruc        | ction                  | j        | k           | Issue | operan | dscomple | teResu           | <u>Į</u> t |          |          |     |     |
|    | L.D            | F6                     | 34+      | R2          | 1     | 2      | 3        | 4                |            |          |          |     |     |
|    | L.D            | F2                     | 45+      | R3          | 5     | 6      | 7        | 8                |            |          |          |     |     |
|    | MUL.D          | F0                     | F2       | F4          | 6     | 9      | 19       | 20               |            |          |          |     |     |
|    | SUB.D          | F8                     | F6       | F2          | 7     | 9      | 11       | 12               |            |          |          |     |     |
|    | DIV.D          | F10                    | F0       | F6          | 8     | 21     | 61       |                  |            |          |          |     |     |
|    | ADD.D          | F6                     | F8       | F2          | 13    | 14     | 16       | 22               |            |          |          |     |     |
|    | <u>Functi</u>  | onal ι                 | ınit sta | <u>ıtus</u> |       |        | dest     | S1               | S2         | FU for j | FU for k | Fj? | Fk? |
|    |                | Time                   | Nam      | е           | Busy  | Ор     | Fi       | Fj               | Fk         | Qj       | Qk       | Rj  | Rk  |
|    |                |                        | Integ    | er          | No    |        |          |                  |            |          |          |     |     |
|    |                |                        | Mult1    |             | No    |        |          |                  |            |          |          |     |     |
|    |                | _                      | Mult2    | 2           | No    |        |          |                  |            |          |          |     |     |
| Do | one executing  |                        | Add      |             | No    |        |          |                  |            |          |          |     |     |
|    |                | 0                      | Divid    | le          | Yes   | Div    | F10      | F0               | F6         |          |          | Yes | Yes |
|    | <u>Regist</u>  | Register result status |          |             |       |        |          |                  |            |          |          |     |     |
|    | Cloc           | k                      |          |             | F0    | F2     | F4       | F6               | F8         | F10      | F12      |     | F30 |
|    | 61             |                        |          | FU          |       |        |          |                  |            | Divide   |          |     |     |
|    |                |                        |          |             |       |        |          |                  |            |          |          |     |     |

DIV.D done executing





#### Dynamic Scheduling: The Tomasulo Algorithm

- Developed at IBM and first implemented in IBM's 360/91 mainframe in 1966, about 3 years after the debut of the scoreboard in the CDC 6600.
- Dynamically schedule the pipeline in hardware to reduce stalls.
- Differences between IBM 360 & CDC 6600 ISA.
  - IBM has only 2 register specifiers/instr vs. 3 in CDC 6600.
  - IBM has 4 FP registers vs. 8 in CDC 6600 (part of ISA).
- Current CPU architectures that can be considered descendants of the IBM 360/91 which implement and utilize a variation of the Tomasulo Algorithm include:

RISC CPUs: Alpha 21264, HP 8600, MIPS R12000, PowerPC G4...

RISC-core x86 CPUs: AMD Athlon, Intel Pentium III, 4, Xeon, ....

In Fourth Edition: Chapter 2.4 (In Third Edition: Chapter 3.2)

#### Tomasulo Algorithm Vs. Scoreboard

- Control & buffers *distributed* with Functional Units (FUs) Vs. centralized in Scoreboard:
  - FU buffers are called <u>"reservation stations"</u> which have pending instructions and operands and other instruction status info (including data dependencies).
  - Reservations stations are sometimes referred to as "physical registers" or "renaming registers" as opposed to architecture registers specified by the ISA.
- ISA Registers in instructions are replaced by either values (if available) or pointers (renamed) to reservation stations (RS) that will supply the value later:

Register Renaming - This process is called <u>register renaming</u>. Done in issue stage (in-order)

- Register renaming eliminates WAR, WAW hazards (name dependence).
- Allows for a hardware-based version of loop unrolling.
- More reservation stations than ISA registers are possible, leading to optimizations that compilers can't achieve and prevents the number of ISA registers from becoming a bottleneck.

Forwarding

Instruction results go (forwarded) from RSs to RSs, *not through registers*, over *Common Data Bus (CDB)* that broadcasts results to all waiting RSs (dependant instructions).

• Loads and Stores are treated as FUs with RSs as well.

In Fourth Edition: Chapter 2.4 (In Third Edition: Chapter 3.2)

**Control Data Corp.** 

#### IBM 360/91 Vs. CDC 6600

Eliminated By register

renaming

Over CDB

Tomasulo-based (1966)

Scoreboard-based (1963)

**Pipelined Functional Units** 

(6 load, 3 store,  $3 + 2 \times \div$ )

window size:  $\leq 14$  instructions

No issue on structural hazard

**WAW:** renaming avoids it

WAR: renaming avoids it

**Broadcast results from FU** 

(Implements forwarding)

Control: reservation stations distributed

Multiple Functional Units (Not pipelined)

 $(1 load/store, 1 + , 2 x, 1 \div)$ 

≤ 5 instructions

same

stall issue ID1

stall completion wb

Write/read registers

(Forwarding *not* supported)

central scoreboard

In Fourth Edition: Chapter 2.4 (In Third Edition: Chapter 3.2)



#### Reservation Station (RS) Fields

- Op Operation to perform in the unit (e.g., + or –)
- Vj, Vk Value of Source operands S1 and S2 When available
  - Store buffers have a single V field indicating result to be stored.
- Qj, Qk Reservation stations producing source registers.

  (value to be written).

  (i.e. operand values needed by instruction)
  - No ready flags as in Scoreboard; Qj,Qk=0 => ready.
  - Store buffers only have Qi for RS producing result. to be stored
- A: Address information for loads or stores. Initially immediate field of instruction then effective address when calculated.
- Busy: Indicates reservation station is busy.
- Register result status: Qi Indicates which Reservation Station will write each register, if one exists.
  - Blank (or 0) when no pending instruction (i.e. RS) exist that will write to that register.

In Fourth Edition: Chapter 2.4 (In Third Edition: Chapter 3.2)

Register bank behaves like a reservation station (listen to CDB for data)

#### Three Stages of Tomasulo Algorithm

**Issue:** Get instruction from pending Instruction Queue (IQ).

**Always** done in program order

- Instruction issued to a free reservation station(RS) (no structural hazard).
- Selected RS is marked busy.

Stage 0 Instruction Fetch (IF): No changes, in-order

- **Control** sends available instruction operands values (from ISA registers) to assigned RS.
- Operands not available yet are renamed to RSs that will produce the operand (register renaming). (<u>Dynamic construction of data dependency graph</u>)
- **Execution (EX):** Operate on operands. Also includes waiting for operands + MEM

- When both operands are ready then start executing on assigned FU.
- If all operands are not ready, watch Common Data Bus (CDB) for needed result (forwarding done via CDB). (i.e. wait on any remaining operands, no RAW)
- Write result (WB): Finish execution.

Data dependencies observed

And also to destination register

- Write result on Common Data Bus (CDB) to all awaiting units (RSs)-
- Mark reservation station as available.

i.e broadcast result on CDB (forwarding)

Normal data bus: data + destination ("go to" bus).

Note: No WB for stores or branches

Can be done out of program

order

<u>Common Data Bus (CDB):</u> data + source ("come from" bus):

- 64 bits for data + 4 bits for Functional Unit source address.
- Write data to waiting RS if source matches expected RS (that produces result).
- Does the result forwarding via broadcast to waiting RSs.

In Fourth Edition: Chapter 2.4 (In Third Edition: Chapter 3.2) **Including destination register** 

## Steps in The Tomsulo Approach and The Requirements of Each Step

| Instruction status | Wait until                                    | Action or bookkeeping                                                                                                                                                                                                                                                     |
|--------------------|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Issue              | Station or buffer empty                       | <pre>if (Register['S1'].Qi ≠0)      {RS[r].Qj← Register['S1'].Qi} else {RS[r].Vj← S1]; RS[r].Qj← 0}; if (Register[S2].Qi≠0)      {RS[r].Qk← Register[S2].Qi}; else {RS[r].Vk← S2; RS[r].Qk← 0} RS[r].Busy← yes; Register['D'].Qi=r;</pre>                                 |
| Execute            | (RS[r].Qj=0) and<br>(RS[r].Qk=0)              | None—operands are in Vj and Vk                                                                                                                                                                                                                                            |
| Write result       | Execution completed at r<br>and CDB available | <pre>∀x(if (Register[x].Qi=r) {Fx← result;     Register[x].Qi←0}); ∀x(if (RS[x].Qj=r) {RS[x].Vj← result;     RS[x].Qj ←0}); ∀x(if (RS[x].Qk=r) {RS[x].Vk← result;     RS[x].Qk ←0}); ∀x(if (Store[x].Qi=r) {Store[x].V← result;     Store[x].Qi ←0}); RS[r].Busy←No</pre> |

In Fourth Edition: Chapter 2.4 (In Third Edition: Chapter 3.2)

## Drawbacks of The Tomasulo Approach

- **Implementation Complexity:** 
  - Example: The implementation of the Tomasulo algorithm may have caused delays in the introduction of 360/91, MIPS 10000, IBM 620 among other CPUs.
- Many high-speed associative result stores using (CDB) are required.
- Performance limited by one Common Data Bus
  - Possible solution:

 $Multiple CDBs \rightarrow more Functional Unit and RS$ logic needed for parallel associative stores.

(Even more complexity)

In Fourth Edition: Chapter 2.4 (In Third Edition: Chapter 3.2)

#### **Tomasulo Approach Example**

Using the same code used in the scoreboard example to be run on the Tomasulo configuration given earlier:

|                                       | # of RSs | EX Cycles |
|---------------------------------------|----------|-----------|
| Integer                               | 3        | 1         |
| <b>Floating Point Multiply/divide</b> | 2        | 10/40     |
| Floating Point add                    | 3        | 2         |

1 L.D

F6, 34(R2)

**Pipelined Functional Units** 

2 L.D

F2,45(R3)

3 MUL. D

F0, F2, F4

4 SUB.D

F8, F6, F2

5 DIV.D

F10, F0, F6

6 ADD.D

F6, F8, F2

Real Data Dependence (RAW) —

Anti-dependence (WAR) +

**Output Dependence** (WAW) —

L.D processing takes two cycles: EX, MEM (only one cycle in scoreboard example)

In Fourth Edition: Chapter 2.5 (In Third Edition: Chapter 3.3)



The same code used is the scoreboard example

(i.e at end of cycle 0)

FP EX Cycles: Add = 2 cycles, Multiply = 10, Divide = 40

| Instruction s | <u>tatus</u>     |          |       | Execution | Write  |          |          |      |        |   |     |
|---------------|------------------|----------|-------|-----------|--------|----------|----------|------|--------|---|-----|
| Instruction   | j                | k        | Issue | complete  | Result |          |          | Busy | Addres | S |     |
| L.D F6        | 34+              | R2       |       |           |        |          | Load1    | No   |        |   |     |
| L.D F2        | 45+              | R3       |       |           |        |          | Load2    | No   |        |   |     |
| MUL.D F0      | F2               | F4       |       |           |        |          | Load3    | No   |        |   |     |
| SUB.D F8      | F6               | F2       |       |           |        |          |          |      |        |   |     |
| DIV.D F10     | F0               | F6       |       |           |        |          |          |      |        |   |     |
| ADD.D F6      | F8               | F2       |       |           |        |          |          |      |        |   |     |
| Reservation   | Stations         | <u>s</u> |       | S1        | S2     | RS for j | RS for k |      |        |   |     |
| Time          | Name             | Busy     | / Ор  | Vj        | Vk     | Qj       | Qk       | ,    |        |   |     |
| 0             | Add1             | No       |       |           |        |          |          |      |        |   |     |
| 0             | Add2             | No       |       |           |        |          |          |      |        |   |     |
| 0             | Add3             | No       |       |           |        |          |          |      |        |   |     |
| 0             | Mult1            | No       |       |           |        |          |          |      |        |   |     |
| 0             | Mult2            | No       |       |           |        |          |          |      |        |   |     |
| Register resu | <u>lt status</u> | •        |       |           |        |          |          |      |        |   |     |
| Clock         |                  | _        | F0    | F2        | F4     | F6       | F8       | F10  | F12 .  |   | F30 |
| 0             |                  | FU [     |       |           |        |          |          |      |        |   |     |

FP EX Cycles: Add = 2 cycles, Multiply = 10, Divide = 40

|       |               | ction s     | tatus           | 1.             |            | Execution |        |     |          |          |            | D           | ۸ ما ماست م   | _    |
|-------|---------------|-------------|-----------------|----------------|------------|-----------|--------|-----|----------|----------|------------|-------------|---------------|------|
| Issue | Instru<br>L.D | ction<br>F6 | <i>J</i><br>34+ | <i>k</i><br>R2 | Issue<br>1 | complete  | e Resu | Iτ  | _        | <b>→</b> | Load1      | Busy<br>Yes | Address 34+R2 | 5    |
|       | L.D           | F2          | 45+             | R3             |            |           |        |     |          |          | Load2      | No          |               |      |
|       | MUL.D         | F0          | F2              | F4             |            |           |        |     |          |          | Load3      | No          |               |      |
|       | SUB.D         | F8          | F6              | F2             |            |           |        |     |          |          |            |             |               |      |
|       | DIV.D         | F10         | F0              | F6             |            |           |        |     |          |          |            |             |               |      |
|       | ADD.D         | F6          | F8              | F2             |            |           |        |     |          |          |            |             |               |      |
|       | Reser         | vation      | Stations        | <u> </u>       |            | S1        | S2     |     | RS for j |          | RS for k   |             |               |      |
|       |               | Time        | Name            | Busy           | Ор         | Vj        | Vk     |     | Qj       |          | Qk         | 1           |               |      |
|       |               |             | Add1            | No             |            |           |        |     |          |          |            |             |               |      |
|       |               | 0           | Add2            | No             |            |           |        |     |          |          |            |             |               |      |
|       |               |             | Add3            | No             |            |           |        |     |          |          |            |             |               |      |
|       |               |             | Mult1           | No             |            |           |        |     |          |          |            |             |               |      |
|       |               |             | Mult2           | No             |            |           |        |     |          |          |            | J           |               |      |
|       |               |             | ult statu       | <u>S</u>       |            |           |        |     |          |          |            |             |               |      |
|       | Cloc          | :k          |                 | FO             | F2         | )         | F4     | F6  |          | F8       | <b>F</b> 1 | 0 F         | 12            | F30  |
|       | 1             |             | FU              |                | 1 2        | -         | 17     | Loa |          | 10       |            | 0 1         | 12            | 7 00 |
|       |               |             | 10              |                |            |           |        | LUa | u i      |          |            |             |               |      |
|       |               |             |                 |                |            |           |        |     |          |          |            |             |               |      |

Issue first load to load1 reservation station



Issue second load to load2 reservation station

Note: Unlike 6600, can have multiple loads outstanding

(CDC6600 only has one integer FU)

Issue



Issue MUL.D to reservation station Mult1



Issue SUB.D

Issue

i.e. register F6 has the loaded value from memory

Load2 completing; what is waiting for it?

FP EX Cycles: Add = 2 cycles, Multiply = 10, Divide = 40



- Load2 result forwarded via CDB to Add1, Mult1
   (SUB.D, MUL.D execution will start execution next cycle 6)
- Issue DIV.D to Mult2 reservation station

FP EX Cycles: Add = 2 cycles, Multiply = 10, Divide = 40

|       | Instruction     | status     |          |             | Execution | Write    |            |       |        |       |             |              |     |
|-------|-----------------|------------|----------|-------------|-----------|----------|------------|-------|--------|-------|-------------|--------------|-----|
|       | Instruction     | j          | k        | Issue       | complete  | Result   |            |       |        | Busy  | <u> Add</u> | <u>res</u> s |     |
|       | L.D F6          | 34+        | R2       | 1           | 3         | 4        |            |       | Load1  | No    |             |              |     |
|       | L.D F2          | 45+        | R3       | 2           | 4         | 5        |            |       | Load2  | No    |             |              |     |
|       | MUL.DF0         | F2         | F4       | 3           |           |          |            |       | Load3  | No    |             |              |     |
|       | SUB.DF8         | F6         | F2       | 4           |           |          |            |       |        |       |             |              |     |
| Lague | DIV.D F10       | F0         | F6       | 5           |           |          |            |       |        |       |             |              |     |
| Issue | ADD.DF6         | F8         | F2       | 6           |           |          |            |       |        |       |             |              |     |
|       | Reservatio      | n Stations | <u>s</u> |             | S1        | S2       | RS f       | for j | RS for | rk    |             |              |     |
|       | Time            | e Name     | Busy     | <u>′ Op</u> | Vj        | Vk       | Qj         |       | Qk     |       |             |              |     |
| E.    | xecution cycles | 1 Add1     | Yes      | SUBD        | M(34+R2)  | M(45+R3) | ı          |       |        |       |             |              |     |
|       | emaining        | 0 Add2     | Yes      | ADDD        |           | M(45+R3) | Add′       | 1     |        |       |             |              |     |
|       |                 | Add3       | No       |             |           |          |            |       |        |       |             |              |     |
|       | \               | 9 Mult1    | Yes      | MULT        | DM(45+R3) | R(F4)    |            |       |        |       |             |              |     |
|       |                 | 0 Mult2_   | Yes      | DIVD        |           | M(34+R2) | Mult1      |       |        |       |             |              |     |
|       | Register resu   | ılt status |          |             |           |          |            |       |        |       |             |              |     |
|       | Clock           |            | F        | =0 F        | F2        | F4       | <i>F</i> 6 |       | F8     | F10   | F12         |              | F30 |
|       | 6               | F          | -U M     | /lult1 N    | M(45+R3)  |          | Add2       |       | Add1   | Mult2 |             |              |     |
|       | ĺ               |            | _        |             |           |          |            |       |        |       |             |              |     |
|       |                 |            |          |             |           |          |            |       |        |       |             |              |     |

ADD.D is issued here vs. scoreboard (in cycle 16)

FP EX Cycles: Add = 2 cycles, Multiply = 10, Divide = 40

|    | Instruct      | ion st         | atus          |          |            | Execution | Write    |          |        |       |     |              |     |
|----|---------------|----------------|---------------|----------|------------|-----------|----------|----------|--------|-------|-----|--------------|-----|
|    | Instruct      | ion            | j             | K        | Issue      | complete  | Result   |          |        | Busy  | Add | <u>res</u> s |     |
|    | L.D F         | -6             | 34+           | R2       | 1          | 3         | 4        |          | Load1  | No    |     |              |     |
|    | L.D F         | -2             | 45+           | R3       | 2          | 4         | 5        |          | Load2  | No    |     |              |     |
|    | MUL.D F       | 0              | F2            | F4       | 3          |           |          |          | Load3  | No    |     |              |     |
|    | SUB.D F       | -8             | F6            | F2       | 4          | 7         |          |          |        |       |     |              |     |
|    | DIV.D F       | 10             | F0            | F6       | 5          | <u></u>   |          |          |        |       |     |              |     |
|    | ADD.D F       | <del>-</del> 6 | F8            | F2       | 6          |           |          |          |        |       |     |              |     |
|    | Reserv        | ation          | Stations      | <u>s</u> |            | S1        | S2       | RS for j | RS for | k     |     |              |     |
|    | -             | Time           | Name          | Busy     | / Op       | Vj        | Vk       | Qj       | Qk     |       |     |              |     |
| Do | one executing | <b>—</b> 0     | Add1          | Yes      | SUBD       | M(34+R2)  | M(45+R3) |          |        |       |     |              |     |
| ١  | <b>8</b>      | 0              | Add2          | Yes      | ADDD       | )         | M(45+R3) | Add1     |        |       |     |              |     |
|    |               |                | Add3          | No       |            |           |          |          |        |       |     |              |     |
|    |               | 8              | Mult1         | Yes      | MULT       | DM(45+R3) | R(F4)    |          |        |       |     |              |     |
|    |               | 0              | Mult2_        | Yes      | DIVD       |           | M(34+R2) | Mult1    |        |       |     |              |     |
|    | Register r    | esult          | <u>status</u> |          |            |           |          |          |        |       |     |              |     |
|    | Clock         |                |               |          | <b>-</b> 0 | F2        | F4       | F6       | F8     | F10   | F12 |              | F30 |
|    | 7             |                | F             | U N      | /lult1 I   | M(45+R3)  |          | Add2     | Add1   | Mult2 |     |              |     |
|    |               |                |               |          |            |           |          |          |        |       |     |              | 1   |

RS Add1 completing; what is waiting for it?



RS Add2 completed execution



 Write back result of ADD.D in this cycle (What about anti-dependence over F6 with DIV.D ?)



Mult1 completed execution; what is waiting for it?

FP EX Cycles: Add = 2 cycles, Multiply = 10, Divide = 40



Only Divide instruction remains DIV.D execution will start next cycle (17)

(vs 62 cycles for scoreboard)



#### Tomasulo Loop Example

(Hardware-Based Version of Loop-Unrolling)

**Loop:** L.D  $F_0, 0(R1)$ 

**MUL.D F4,F0,F2** 

**S.D <b>F**4, 0(R1)

**DADDUI** R1,R1, # -8

BNE R1,R2, Loop; branch if  $R1 \neq R2$ 

Assume FP Multiply takes 4 execution clock cycles.

• Assume first load takes 8 cycles (possibly due to a cache miss), second load takes 4 cycles (cache hit).

• Assume R1 = 80 initially.

Assume DADDUI only takes one cycle (issue)

i.e. Perfect branch prediction. How?

**Note independent loop iterations** 

(the same loop used in loop unrolling example)

• Assume branch resolved in issue stage (no EX or CDB write)

Target?
What if prediction
Is wrong?

- Assume branch is predicted taken and no branch misprediction.
- No branch delay slot is used in this example.
- Stores take 4 cycles (ex, mem) and do not write on CDB
- We'll go over the execution to complete first two loop iterations.



(i.e at end of cycle 0)

|        |          |                 |          |            |            |           |          |            | (ne at cha | r cycle o)  |
|--------|----------|-----------------|----------|------------|------------|-----------|----------|------------|------------|-------------|
| Instru | ction st | atus            |          |            |            | Execution | n Write  |            |            |             |
| Instru | ction    | j               | k        | iteration  | Issue      | complete  | Result   | _          | Busy Addr  | ess         |
| L.D    | F0       | 0               | R1       | 1          |            |           |          | Load1      | No         |             |
| MUL.D  | F4       | F0              | F2       | 1          |            |           |          | Load2      | No         |             |
| S.D    | F4       | 0               | R1       | 1          |            |           |          | Load3      | No         | Qi          |
| L.D    | F0       | 0               | R1       | 2          |            |           |          | Store1     | No         |             |
| MUL.D  | F4       | F0              | F2       | 2          |            |           |          | Store2     | No         |             |
| S.D    | F4       | 0               | R1       | 2          |            |           |          | Store3     | No         |             |
| Reser  | vation   | <u>Stations</u> |          |            | S1         | S2        | RS for j | i RS for k |            |             |
|        | Time     | Name            | Busy     | Ор         | Vj         | Vk        | Qj       | Qk         | Code:      |             |
|        | 0        | Add1            | No       |            |            |           |          |            | L.D        | F0, 0(R1)   |
|        | 0        | Add2            | No       |            |            |           |          |            | MUL.D      | F4,F0,F2    |
|        | 0        | Add3            | No       |            |            |           |          |            | S.D        | F4, 0(R1)   |
|        | 0        | Mult1           | No       |            |            |           |          |            | DADDUI     | R1, R1, #-8 |
|        | 0        | Mult2           | No       |            |            |           |          |            | BNE        | R1,R2,loop  |
| Regis  | ter resu | ult status      | <u>i</u> |            |            |           |          |            |            |             |
| Cloc   | k        | R1              |          | <i>F</i> 0 | <i>F</i> 2 | F4        | F6       | <b>F</b> 8 | F10 F12    | 2 F30       |
| 0      |          | 80              | Qi       |            |            |           |          |            |            |             |
|        |          |                 |          |            |            |           |          |            |            |             |



Issue

**Execution Write** Instruction status iteration complete Result Busy Address Instruction k Issue 0 R1 L.D F<sub>0</sub> Load1 7 Yes 80 F0 F2 2 MUL.D F4 Load2 No S.D F4 0 R1 Load3 No Qi L.D F<sub>0</sub> 0 R1 Store1 No Store2 MUL.D F4 F0 F2 No S.D F4 0 R1 Store3 No **Reservation Stations** S1 S<sub>2</sub> RS for j RS for k Time Name Busy Op Vk Qk Code: Qį L.D F0, 0(R1) 0 Add1 No MUL.D F4,F0,F2 0 Add2 No Issue S.D F4, 0(R1) 0 Add3 No DADDUI R1, R1, #-8 → 0 Mult1 Yes MULTD R(F2) Load1 BNE R1,R2,loop 0 Mult2 No Register result status F4 F10 F12 ... F30 Clock *F0 F*2 *F*6 F8 R1 Qi 80 2 Load1 Mult1 First MUL.D issues, wait on first L.D (Load1) to write on CDB

| Instruction s | status          |      |            |       | Execution | Write    |          |           |                |
|---------------|-----------------|------|------------|-------|-----------|----------|----------|-----------|----------------|
| Instruction   | j               | k    | iteration  | Issue | complete  | Result   | _        | Busy Addr | ess            |
| L.D F0        | 0               | R1   | 1          | 1     |           |          | Load1 6  | Yes 80    |                |
| MUL.D F4      | F0              | F2   | 1          | _2_   |           |          | Load2    | No        |                |
| S.D F4        | 0               | R1   | 1          | 3     |           |          | Load3    | No        | Qi             |
| L.D F0        | 0               | R1   | 2          |       |           |          | Store1   | Yes 80    | Mult1 ←        |
| MUL.D F4      | F0              | F2   | 2          |       |           |          | Store2   | No        |                |
| S.D F4        | 0               | R1   | 2          |       |           |          | Store3   | No        |                |
| Reservation   | <b>Stations</b> |      |            | S1    | S2        | RS for j | RS for k |           |                |
| Time          | Name            | Busy | Ор         | Vj    | Vk        | Qj       | Qk       | Code:     |                |
| 0             | Add1            | No   |            |       |           |          |          | L.D       | F0, 0(R1)      |
| 0             | Add2            | No   |            |       |           |          |          | MUL.D     | F4,F0,F2       |
| 0             | Add3            | No   |            |       |           |          |          | S.D       | F4, 0(R1) Issu |
| 0             | Mult1           | Yes  | MULTD      |       | R(F2)     | Load1    |          | DADDUI    | R1, R1, #-8    |
| 0             | Mult2           | No   |            |       |           |          |          | BNE       | R1,R2,loop     |
| Register res  | ult status      |      |            |       |           |          |          | _         |                |
| Clock         | R1              |      | <i>F</i> 0 | F2    | F4        | F6       | F8       | F10 F12   | 2 F30          |
| 3             | 80              | Qi   | Load1      |       | Mult1     |          |          |           |                |

First S.D issues, wait on first MUL.D (Mult1) to write on CDB

Issue

EECC551 - Shaaban

#58 lec # 4 Spring 2013 3-18-2013

| Instruction s | <u>tatus</u>    |          |           |       | Execution | Write    |          |           |                |
|---------------|-----------------|----------|-----------|-------|-----------|----------|----------|-----------|----------------|
| Instruction   | j               | k        | iteration | Issue | complete  | Result   | _        | Busy Addr | ess            |
| L.D F0        | 0               | R1       | 1         | 1     |           |          | Load1 5  | Yes 80    |                |
| MUL.D F4      | F0              | F2       | 1         | 2     |           |          | Load2    | No        |                |
| S.D F4        | 0               | R1       | 1         | 3     |           |          | Load3    | No        | Qi             |
| L.D F0        | 0               | R1       | 2         |       |           |          | Store1   | Yes 80    | Mult1          |
| MUL.D F4      | F0              | F2       | 2         |       |           |          | Store2   | No        |                |
| S.D F4        | 0               | R1       | 2         |       |           |          | Store3   | No        |                |
| Reservation   | <b>Stations</b> |          |           | S1    | S2        | RS for j | RS for k |           |                |
| Time          | Name            | Busy     | Ор        | Vj    | Vk        | Qj       | Qk       | Code:     |                |
| 0             | Add1            | No       |           |       |           |          |          | L.D       | F0, 0(R1)      |
| 0             | Add2            | No       |           |       |           |          |          | MUL.D     | F4,F0,F2       |
| 0             | Add3            | No       |           |       |           |          |          | S.D       | F4, 0(R1)      |
| 0             | Mult1           | Yes      | MULTD     |       | R(F2)     | Load1    |          | DADDUI    | R1, R1, #-     |
| 0             | Mult2           | No       |           |       |           |          |          | BNE       | R1,R2,loop     |
| Register res  | ult status      | <u> </u> |           |       |           |          |          |           |                |
| Clock         | R1              |          | F0        | F2    | F4        | F6       | F8       | F10 F12   | 2 <i>F</i> 30_ |
| 4             | <b>72</b>       | Qi       | Load1     |       | Mult1     |          |          |           |                |
|               |                 |          |           |       |           |          |          |           |                |

First DADDUI issues (not shown)

| Instruc | ction s | <u>tatus</u>    |          |            |            | Execution | Write    |          |           |                 |
|---------|---------|-----------------|----------|------------|------------|-----------|----------|----------|-----------|-----------------|
| Instruc | ction   | j               | k        | iteration  | Issue      | complete  | Result   | _        | Busy Addi | ress            |
| L.D     | F0      | 0               | R1       | 1          | 1          |           |          | Load1 4  | Yes 80    |                 |
| MUL.D   | F4      | F0              | F2       | 1          | 2          |           |          | Load2    | No        |                 |
| S.D     | F4      | 0               | R1       | 1          | 3          |           |          | Load3    | No        | Qi              |
| L.D     | F0      | 0               | R1       | 2          |            |           |          | Store1   | Yes 80    | Mult1           |
| MUL.D   | F4      | F0              | F2       | 2          |            |           |          | Store2   | No        |                 |
| S.D     | F4      | 0               | R1       | 2          |            |           |          | Store3   | No        |                 |
| Reser   | vation  | <b>Stations</b> |          |            | S1         | S2        | RS for j | RS for k |           |                 |
|         | Time    | Name            | Busy     | Ор         | Vj         | Vk        | Qj       | Qk       | Code:     |                 |
|         | 0       | Add1            | No       |            |            |           |          |          | L.D       | F0, 0(R1)       |
|         | 0       | Add2            | No       |            |            |           |          |          | MUL.D     | F4,F0,F2        |
|         | 0       | Add3            | No       |            |            |           |          |          | S.D       | F4, 0(R1)       |
|         | 0       | Mult1           | Yes      | MULTD      |            | R(F2)     | Load1    |          | DADDUI    | R1, R1, #-8     |
|         | 0       | Mult2           | No       |            |            |           |          |          | BNE       | R1,R2,loop Issu |
| Regist  | ter res | ult status      | <u> </u> |            |            |           |          |          |           | 1330            |
| Cloc    | k       | R1              |          | <i>F</i> 0 | <i>F</i> 2 | F4        | F6       | F8       | F10 F12   | 2 <i>F</i> 30_  |
| 5       |         | 72              | Qi       | Load1      |            | Mult1     |          |          |           |                 |

First BNE issues (not shown), assumed predicted taken

| Instruction s | <u>tatus</u>    |          |           |       | Execution | Write    |                    |          |                 |
|---------------|-----------------|----------|-----------|-------|-----------|----------|--------------------|----------|-----------------|
| Instruction   | j               | k        | iteration | Issue | complete  | Result   | _                  | Busy Add | ress            |
| L.D F0        | 0               | R1       | 1         | 1     |           |          | Load1 3            | Yes 80   |                 |
| MUL.D F4      | F0              | F2       | 1         | 2     |           |          | Load2 <sup>4</sup> | Yes 72   | <b>├</b>        |
| S.D F4        | 0               | R1       | 1         | 3_    |           |          | Load3              | No       | Qi              |
| L.D F0        | 0               | R1       | 2         | 6     |           |          | Store1             | Yes 80   | Mult1           |
| MUL.D F4      | F0              | F2       | 2         |       |           |          | Store2             | No       |                 |
| S.D F4        | 0               | R1       | 2         |       |           |          | Store3             | No       |                 |
| Reservation   | <b>Stations</b> |          |           | S1    | S2        | RS for j | RS for k           |          |                 |
| Time          | Name            | Busy     | Ор        | Vj    | Vk        | Qj       | Qk                 | Code:    |                 |
| 0             | Add1            | No       |           |       |           |          |                    | L.D      | F0, 0(R1) Issue |
| 0             | Add2            | No       |           |       |           |          |                    | MUL.D    | F4,F0,F2        |
| 0             | Add3            | No       |           |       |           |          |                    | S.D      | F4, 0(R1)       |
| 0             | Mult1           | Yes      | MULTD     |       | R(F2)     | Load1    |                    | DADDUI   | R1, R1, #-8     |
| 0             | Mult2           | No       |           |       |           |          |                    | BNE      | R1,R2,loop      |
| Register res  | ult status      | <u> </u> |           |       |           |          |                    |          | •               |
| Clock         | R1              |          | F0        | F2    | F4        | F6       | F8                 | F10 F12  | 2 <i>F</i> 30   |
| 6             | 72              | Qi       | Load2     |       | Mult1     |          |                    |          |                 |
|               |                 |          |           |       |           |          |                    |          |                 |

- Second L.D. issues (will take four ex cycles) Note: F0 never sees Load1 result
- WAW between first and second L.D on F0 eliminated by register renaming

EECC551 - Shaaban

Issue

| Instruction stat | <u>tus</u>     |          |           |            | Execution | Write    |          |       |      |                |
|------------------|----------------|----------|-----------|------------|-----------|----------|----------|-------|------|----------------|
| Instruction      | j              | k        | iteration | Issue      | complete  | Result   |          | Busy  | Addr | ess            |
| L.D F0           | 0              | R1       | 1         | 1          |           |          | Load1 2  | Yes   | 80   |                |
| MUL.D F4         | F0             | F2       | 1         | 2          |           |          | Load2 3  | Yes   | 72   |                |
| S.D F4           | 0              | R1       | 1         | 3          |           |          | Load3    | No    |      | Qi             |
| L.D F0           | 0              | R1       | 2         | 6          |           |          | Store1   | Yes   | 80   | Mult1          |
| MUL.D F4         | FO             | F2       | 2         | 7          |           |          | Store2   | No    |      |                |
| S.D F4           | 0              | R1       | 2         |            |           |          | Store3   | No    |      |                |
| Reservation S    | <u>tations</u> |          |           | S1         | S2        | RS for j | RS for k |       |      | _              |
| Time ∧           | lame           | Busy     | Ор        | Vj         | Vk        | Qj       | Qk       | Code: |      |                |
| 0 A              | dd1            | No       |           |            |           |          |          | L.D   |      | F0, 0(R1)      |
| 0 A              | dd2            | No       |           |            |           |          |          | MUL.  | )    | F4,F0,F2 Issue |
| 0 A              | dd3            | No       |           |            |           |          |          | S.D   |      | F4, 0(R1)      |
| 0 M              | /lult1         | Yes      | MULTD     |            | R(F2)     | Load1    |          | DADD  | UI I | R1, R1, #-8    |
| → 0 N            | /lult2         | Yes      | MULTD     |            | R(F2)     | Load2    |          | BNE   |      | R1,R2,loop     |
| Register result  | t status       | <u>.</u> |           |            |           |          |          |       |      |                |
| Clock            | R1             |          | F0        | <i>F</i> 2 | F4        | F6       | F8       | F10   | F12  | ? F30_         |
| 7                | <b>72</b>      | Qi       | Load2     |            | Mult2     |          |          |       |      |                |
|                  |                |          |           |            |           |          |          |       |      |                |

- Second MUL.D issues (to RS Mult2) Note: F4 never sees Mult1 result
- WAW between first and second MUL.D on F4 eliminated by register renaming

| Instruction s | tatus      |          |           |       | Execution | Write    |                    |       |       |             |          |
|---------------|------------|----------|-----------|-------|-----------|----------|--------------------|-------|-------|-------------|----------|
| Instruction   | j          | k        | iteration | Issue | complete  | Result   |                    | Busy  | Addre | ess         |          |
| L.D F0        | 0          | R1       | 1         | 1     |           |          | Load1 1            | Yes   | 80    |             |          |
| MUL.D F4      | F0         | F2       | 1         | 2     |           |          | Load2 <sup>2</sup> | Yes   | 72    |             |          |
| S.D F4        | 0          | R1       | 1         | 3     |           |          | Load3              | No    |       | Qi          | _        |
| L.D F0        | 0          | R1       | 2         | 6     |           |          | Store1             | Yes   | 80    | Mult1       |          |
| MUL.D F4      | F0         | F2       | 2         | 7     |           |          | Store2             | Yes   | 72    | Mult2       | <b>├</b> |
| S.D F4        | 0          | R1       | 2         | 8     |           |          | Store3             | No    |       |             |          |
| Reservation   | Stations   |          |           | S1    | S2        | RS for j | RS for k           |       |       |             |          |
| Time          | Name       | Busy     | Ор        | Vj    | Vk        | Qj       | Qk                 | Code: |       |             |          |
| 0             | Add1       | No       |           |       |           |          |                    | L.D   |       | F0, 0(R1)   |          |
| 0             | Add2       | No       |           |       |           |          |                    | MUL.  | D     | F4,F0,F2    | _        |
| 0             | Add3       | No       |           |       |           |          |                    | S.D   |       | F4, 0(R1)   | Issue    |
| 0             | Mult1      | Yes      | MULTD     |       | R(F2)     | Load1    |                    | DADI  | DUI   | R1, R1, #-8 | 5 T      |
| 0             | Mult2      | Yes      | MULTD     |       | R(F2)     | Load2    |                    | BNE   |       | R1,R2,loop  | )        |
| Register res  | ult status | <u> </u> |           |       |           |          |                    |       |       |             |          |
| Clock         | R1         |          | F0        | F2    | F4        | F6       | F8                 | F10   | F12   | ? F30       |          |
| 8             | <b>72</b>  | Qi       | Load2     |       | Mult2     |          |                    |       |       |             |          |
|               |            |          |           |       |           |          |                    |       |       |             |          |

Second S.D issues (to RS Store2)

Issue

| <u>Instru</u> | ction st | tatus           |          |           |       | Execution | Write    | First 1  | Load EX 1 | Done |                  |
|---------------|----------|-----------------|----------|-----------|-------|-----------|----------|----------|-----------|------|------------------|
| Instru        | ction    | j               | k        | iteration | Issue | complete  | Result   |          | Busy      | Addr | ess              |
| L.D           | F0       | 0               | R1       | 1         | 1     | 9         |          | Load1 0  | Yes       | 80   |                  |
| MUL.D         | F4       | F0              | F2       | 1         | 2     |           |          | Load2 1  | Yes       | 72   |                  |
| S.D           | F4       | 0               | R1       | 1         | 3     |           |          | Load3    | No        |      | Qi               |
| L.D           | F0       | 0               | R1       | 2         | 6     |           |          | Store1   | Yes       | 80   | Mult1            |
| MUL.D         | F4       | F0              | F2       | 2         | 7     |           |          | Store2   | Yes       | 72   | Mult2            |
| S.D           | F4       | 0               | R1       | 2         | 8     |           |          | Store3   | No        |      |                  |
| <u>Reser</u>  | rvation  | <b>Stations</b> |          |           | S1    | S2        | RS for j | RS for k |           |      |                  |
|               | Time     | Name            | Busy     | Ор        | Vj    | Vk        | Qj       | Qk       | Code:     |      |                  |
|               | 0        | Add1            | No       |           |       |           |          |          | L.D       |      | F0, 0(R1)        |
|               | 0        | Add2            | No       |           |       |           |          |          | MUL       | .D   | F4,F0,F2         |
|               | 0        | Add3            | No       |           |       |           |          |          | S.D       |      | F4, 0(R1)        |
|               | 0        | Mult1           | Yes      | MULTD     |       | R(F2)     | Load1    |          | DAD       | DUI  | R1, R1, #- Issue |
|               | 0        | Mult2           | Yes      | MULTD     |       | R(F2)     | Load2    |          | BNE       |      | R1,R2,loop       |
| <u>Regis</u>  | ter resi | ult status      | <u>i</u> |           |       |           |          |          |           |      |                  |
| Cloc          | ck       | R1              |          | F0        | F2    | F4        | F6       | F8       | F10       | F12  | 2 F30            |

Mult2

Issue second DADDUI (not shown)

Qi

64

9

Load1 completing; what is waiting for it?

Load2





- Load1 result forwarded via CDB to Mult1, execution will start next cycle 11
- Issue second BNE (not shown)
- Load2 completing; what is waiting for it?



| Instruction st | <u>tatus</u>    |          |            |       | Execution | Write    |          |          |                   |
|----------------|-----------------|----------|------------|-------|-----------|----------|----------|----------|-------------------|
| Instruction    | j               | k        | iteration  | Issue | complete  | Result   | _        | Busy Add | dress             |
| L.D F0         | 0               | R1       | 1          | 1     | 9         | 10       | Load1    | No       |                   |
| MUL.D F4       | F0              | F2       | 1          | 2     |           |          | Load2    | No       |                   |
| S.D F4         | 0               | R1       | 1          | 3     |           |          | Load3 3  | Yes 64   | Qi                |
| L.D F0         | 0               | R1       | 2          | 6     | 10        | 11       | Store1   | Yes 80   | Mult1             |
| MUL.D F4       | F0              | F2       | 2          | 7     |           |          | Store2   | Yes 72   | Mult2             |
| S.D F4         | 0               | R1       | 2          | 8     |           |          | Store3   | No       |                   |
| Reservation    | <b>Stations</b> |          |            | S1    | S2        | RS for j | RS for k |          |                   |
| Time           | Name            | Busy     | Ор         | Vj    | Vk        | Qj       | Qk       | Code:    |                   |
| 0              | Add1            | No       |            |       |           |          |          | L.D      | F0, 0(R1)         |
| 0              | Add2            | No       |            |       |           |          |          | MUL.D    | F4,F0,F2 - Issue? |
| 0              | Add3            | No       |            |       |           |          |          | S.D      | F4, 0(R1)         |
| 2              | Mult1           | Yes      | MULTD      | M(80) | R(F2)     |          |          | DADDUI   | R1, R1, #-8       |
| 3              | Mult2           | Yes      | MULTD      | M(72) | R(F2)     |          |          | BNE      | R1,R2,loop        |
| Register resu  | ult status      | <u> </u> |            |       |           |          |          |          |                   |
| Clock          | R1              |          | <i>F</i> 0 | F2    | F4        | F6       | F8       | F10 F1   | 2 F30             |
| 12             | 64              | Qi       | Load3      |       | Mult2     |          |          |          |                   |
|                |                 |          |            |       |           |          |          |          |                   |

Issue third iteration MUL.D?

| Instruction s | <u>tatus</u>    |          |           |            | Execution | Write    |          |           |               |
|---------------|-----------------|----------|-----------|------------|-----------|----------|----------|-----------|---------------|
| Instruction   | j               | k        | iteration | Issue      | complete  | Result   | _        | Busy Addr | ess           |
| L.D F0        | 0               | R1       | 1         | 1          | 9         | 10       | Load1    | No        |               |
| MUL.D F4      | F0              | F2       | 1         | 2          |           |          | Load2    | No        |               |
| S.D F4        | 0               | R1       | 1         | 3          |           |          | Load3 2  | Yes 64    | Qi            |
| L.D F0        | 0               | R1       | 2         | 6          | 10        | 11       | Store1   | Yes 80    | Mult1         |
| MUL.D F4      | F0              | F2       | 2         | 7          |           |          | Store2   | Yes 72    | Mult2         |
| S.D F4        | 0               | R1       | 2         | 8          |           |          | Store3   | No        |               |
| Reservation   | <b>Stations</b> |          |           | S1         | S2        | RS for j | RS for k |           |               |
| Time          | Name            | Busy     | Ор        | Vj         | Vk        | Qj       | Qk       | Code:     |               |
| 0             | Add1            | No       |           |            |           |          |          | L.D       | F0, 0(R1)     |
| 0             | Add2            | No       |           |            |           |          |          | MUL.D     | F4,F0,F2 Issu |
| 0             | Add3            | No       |           |            |           |          |          | S.D       | F4, 0(R1)     |
| 1             | Mult1           | Yes      | MULTD     | M(80)      | R(F2)     |          |          | DADDUI    | R1, R1, #-8   |
| 2             | Mult2           | Yes      | MULTD     | M(72)      | R(F2)     |          |          | BNE       | R1,R2,loop    |
| Register res  | ult status      | <u> </u> |           |            |           |          |          |           |               |
| Clock         | R1              |          | F0        | <i>F</i> 2 | F4        | F6       | F8       | F10 F12   | 2 F30         |
| 13            | 64              | Qi       | Load3     |            | Mult2     |          |          |           |               |
|               |                 |          |           |            |           |          |          |           | <del></del>   |

Issue third iteration MUL.D, S.D?



Mult1 completing; what is waiting for it?





- Mult2 completing; what is waiting for it?
- Third iteration L.D done execution

Issue third multiply?

| Instruction      | n stat       | tus            |       |             |            | Execution | Write    |                     |         |                |
|------------------|--------------|----------------|-------|-------------|------------|-----------|----------|---------------------|---------|----------------|
| Instruction      | า            | j              | k     | iteration   | Issue      | complete  | Result   |                     | Busy Ac | ddress         |
| L.D F0           |              | 0              | R1    | 1           | 1          | 9         | 10       | Load1               | No      |                |
| MUL.D F4         |              | F0             | F2    | 1           | 2          | 14        | 15       | Load2               | No      |                |
| S.D F4           |              | 0              | R1    | 1           | 3          |           |          | Load3               | No      | Qi             |
| L.D F0           |              | 0              | R1    | 2           | 6          | 10        | 11       | Store1 3            | Yes 8   | 0 M(80)*R(F2)  |
| MUL.D F4         |              | F0             | F2    | 2           | 7          | 15        | 16       | Store2 <sup>4</sup> | Yes 7   | 2 M(72)*R(72)  |
| S.D F4           |              | 0              | R1    | 2           | 8          |           |          | Store3              | No      |                |
| <u>Reservati</u> | on St        | <u>tations</u> |       |             | S1         | S2        | RS for j | RS for k            |         |                |
| Tir              | ne N         | lame           | Busy  | Ор          | Vj         | Vk        | Qj       | Qk                  | Code:   |                |
|                  | 0 A          | dd1            | No    |             |            |           |          |                     | L.D     | F0, 0(R1)      |
|                  | 0 A          | dd2            | No    |             |            |           |          |                     | MUL.D   | F4,F0,F2 Issue |
|                  | 0 A          | dd3            | No    |             |            |           |          |                     | S.D     | F4, 0(R1)      |
|                  | 0 M          | 1ult1          | Yes   | MULTD       | M(64)      | R(F2)     |          |                     | DADDU   | JI R1, R1, #-8 |
|                  | 0 M          | 1ult2          | No    |             |            |           |          |                     | BNE     | R1,R2,loop     |
| Register r       | <u>esult</u> | status         |       |             | ,          |           |          |                     |         |                |
| Clock            |              | R1             |       | F0          | <i>F</i> 2 | F4        | F6       | F8                  | F10 F   | 12 F30         |
| 16               |              | 64             | Qi    |             |            | Mult1     |          |                     |         |                |
| lss              | sue t        | hird i         | terat | ion MUL.D ( | to RS I    | Mult1)    |          |                     |         |                |
|                  |              |                |       | (           |            |           |          | FFC                 | C551 _  | Shaaban 1      |
|                  |              |                |       |             |            |           |          | ال المالات          | CJJI -  | Siiaavaii      |



Third iteration L.D writes on CDB (delayed one cycle due to CDB conflict) Issue third iteration S.D (to RS Store3)

| <u>Instruction</u> | <u>status</u> |          |           |            | Execution | Write    |                     |        |       |                   |
|--------------------|---------------|----------|-----------|------------|-----------|----------|---------------------|--------|-------|-------------------|
| Instruction        | j             | k        | iteration | Issue      | complete  | Result   | _                   | Busy I | Addre | ess               |
| L.D F0             | 0             | R1       | 1         | 1          | 9         | 10       | Load1               | No     |       |                   |
| MUL.D F4           | F0            | F2       | 1         | 2          | 14        | 15       | Load2               | No     |       |                   |
| S.D F4             | 0             | R1       | 1         | 3          |           |          | Load3               | No     |       | Qi                |
| L.D F0             | 0             | R1       | 2         | 6          | 10        | 11       | Store1 1            | Yes    | 80    | M(80)*R(F2)       |
| MUL.D F4           | F0            | F2       | 2         | 7          | 15        | 16       | Store2 <sup>2</sup> | Yes    | 72    | M(72)*R(72)       |
| S.D F4             | 0             | R1       | 2         | 8          |           |          | Store3              | Yes    | 64    | Mult1             |
| Reservation        | n Stations    | <u>i</u> |           | S1         | S2        | RS for j | RS for $k$          |        |       | _                 |
| Time               | e Name        | Busy     | Ор        | Vj         | Vk        | Qj       | Qk                  | Code:  |       |                   |
| (                  | Add1          | No       |           |            |           |          |                     | L.D    |       | F0, 0(R1)         |
| (                  | Add2          | No       |           |            |           |          |                     | MUL.   | D     | F4,F0,F2          |
| (                  | Add3          | No       |           |            |           |          |                     | S.D    |       | F4, 0(R1)         |
| (                  | 3 Mult1       | Yes      | MULTD     | M(64)      | R(F2)     |          |                     | DADI   | DUI   | R1, R1, #-{ Issue |
| (                  | 0 Mult2       | No       |           |            |           |          |                     | BNE    |       | R1,R2,loop        |
| Register res       | sult status   | <u> </u> |           |            |           |          |                     |        |       |                   |
| Clock              | R1            |          | F0        | <i>F</i> 2 | F4        | F6       | F8                  | F10    | F12   | F30               |
| 18                 | 56            | Qi       |           |            | Mult1     |          |                     |        |       |                   |

**Issue third iteration DADDUI** 

#### (First Loop Iteration Done)

# **Loop Example Cycle 19**

|             |               |          |            |         |           |          | First S  | tore Done    | ٦            |                            |     |
|-------------|---------------|----------|------------|---------|-----------|----------|----------|--------------|--------------|----------------------------|-----|
| Instruction | <u>status</u> |          |            |         | Execution | Write    | Thistis  | tore Bone    |              |                            |     |
| Instruction | j             | k        | iteration  | Issue   | complete  | Result   | _ \      | Busy         | Addr         | ess                        |     |
| L.D F0      | 0             | R1       | 1          | 1       | 9         | 10       | Load1    | No           |              |                            | L   |
| MUL.D F4    | F0            | F2       | 1          | 2       | 14        | 15       | Load2 \  | No           |              |                            |     |
| S.D F4      | 0             | R1       | 1          | 3       | 19        |          | Load3    | No           |              | Qi                         |     |
| L.D F0      | 0             | R1       | 2          | 6       | 10        | 11       | Store1 0 | No           |              |                            |     |
| MUL.D F4    | F0            | F2       | 2          | 7       | 15        | 16       | Store2 1 | Yes          | 72           | M(72)*R(72)                |     |
| S.D F4      | 0             | R1       | 2          | 8       |           |          | Store3   | Yes          | 64           | Mult1                      |     |
| Reservation | n Stations    |          |            | S1      | S2        | RS for j | RS for k |              |              |                            |     |
| Time        | e Name        | Busy     | Ор         | Vj      | Vk        | Qj       | Qk       | Code:        |              |                            |     |
| (           | Add1          | No       |            |         |           |          |          | L.D          |              | F0, 0(R1)                  |     |
| (           | Add2          | No       |            |         |           |          |          | MUL          | D            | F4,F0,F2                   | L   |
| (           | Add3          | No       |            |         |           |          |          | S.D          |              | F4, 0(R1)                  |     |
| 2           | 2 Mult1       | Yes      | MULTD      | M(64)   | R(F2)     |          |          | DAD          | DUI          | R1, R1, #-8                | L   |
| (           | Mult2         | No       |            |         |           |          |          | BNE          |              | R1,R2,loop <sub>Issu</sub> | ue. |
| Register re | sult status   | <u> </u> |            |         |           |          |          |              |              | 1330                       | T   |
| Clock       | R1            |          | F0         | F2      | F4        | F6       | F8       | F10          | F12          | 2 F30                      |     |
| 19          | 56            | Qi       |            |         | Mult1     |          |          |              |              |                            |     |
|             |               | •        | o write on | CDB for | stores)   | First I  | oop iter | ation (      | done         | •                          |     |
| Issue       | third ite     | eratic   | n BNE      |         |           |          | FFC      | <i>CEE</i> 1 |              | haaban 📙                   |     |
|             |               |          |            |         |           |          | CLU      | しつづ」         | L <b>-</b> 3 | maayan 🔽                   | _   |

#### (First Two Loop Iterations Done)

## **Loop Example Cycle 20**

| Instruction s | tatus           |          |           |       | Execution | Write    |          |           |                 |
|---------------|-----------------|----------|-----------|-------|-----------|----------|----------|-----------|-----------------|
| Instruction   | j               | k        | iteration | Issue | complete  | Result   | _        | Busy Addr | ess             |
| L.D F0        | 0               | R1       | 1         | 1     | 9         | 10       | Load1 4  | Yes 54    |                 |
| MUL.D F4      | F0              | F2       | 1         | 2     | 14        | 15       | Load2    | No        |                 |
| S.D F4        | 0               | R1       | 1         | 3     | 19        |          | Load3    | No        | Qi              |
| L.D F0        | 0               | R1       | 2         | 6     | 10        | 11       | Store1   | No        |                 |
| MUL.D F4      | F0              | F2       | 2         | 7     | 15        | 16       | Store2 0 | No        |                 |
| S.D F4        | 0               | R1       | 2         | 8     | 20        |          | Store3   | Yes 64    | Mult1           |
| Reservation   | <b>Stations</b> |          |           | S1    | S2        | RS for j | RS for k |           |                 |
| Time          | Name            | Busy     | Ор        | Vj    | Vk        | Qj       | Qk       | Code:     |                 |
| 0             | Add1            | No       |           |       |           |          |          | L.D       | F0, 0(R1) Issue |
| 0             | Add2            | No       |           |       |           |          |          | MUL.D     | F4,F0,F2        |
| 0             | Add3            | No       |           |       |           |          |          | S.D       | F4, 0(R1)       |
| 1             | Mult1           | Yes      | MULTD     | M(64) | R(F2)     |          |          | DADDUI    | R1, R1, #-8     |
| 0             | Mult2           | No       |           |       |           |          |          | BNE       | R1,R2,loop      |
| Register res  | ult status      | <u> </u> |           |       |           |          |          |           |                 |
| Clock         | R1              |          | F0        | F2    | F4        | F6       | F8       | F10 F12   | ? F30           |
| 20            | 56              | Qi       | Load1     |       | Mult1     |          |          |           |                 |
|               |                 | L        |           |       |           |          |          |           |                 |

Second S.D done (No write on CDB for stores) Second loop iteration done Issue fourth iteration L.D (to RS Load1)

| Instruction st | atus            |          |           |            | Execution | Write    |          |        |      |             |                                              |
|----------------|-----------------|----------|-----------|------------|-----------|----------|----------|--------|------|-------------|----------------------------------------------|
| Instruction    | j               | k        | iteration | Issue      | complete  | Result   | _        | Busy A | \ddr | ess         |                                              |
| L.D F0         | 0               | R1       | 1         | 1          | 9         | 10       | Load1 3  | Yes    | 54   |             |                                              |
| MUL.D F4       | F0              | F2       | 1         | 2          | 14        | 15       | Load2    | No     |      |             |                                              |
| S.D F4         | 0               | R1       | 1         | 3          | 19        |          | Load3    | No     |      | Qi          | _                                            |
| L.D F0         | 0               | R1       | 2         | 6          | 10        | 11       | Store1   | No     |      |             |                                              |
| MUL.D F4       | F0              | F2       | 2         | 7          | 15        | 16       | Store2   | No     |      |             |                                              |
| S.D F4         | 0               | R1       | 2         | 8          | 20        |          | Store3   | Yes    | 64   | Mult1       |                                              |
| Reservation    | <u>Stations</u> |          |           | S1         | S2        | RS for j | RS for k |        |      |             |                                              |
| Time           | Name            | Busy     | Ор        | Vj         | Vk        | Qj       | Qk       | Code:  |      |             |                                              |
| 0              | Add1            | No       |           |            |           |          |          | L.D    |      | F0, 0(R1)   | _                                            |
| 0              | Add2            | No       |           |            |           |          |          | MUL.D  | )    | F4,F0,F2    | Issue                                        |
| EX Done 0      | Add3            | No       |           |            |           |          |          | S.D    |      | F4, 0(R1)   |                                              |
| DEX DOILE 0    | Mult1           | Yes      | MULTD     | M(64)      | R(F2)     |          |          | DADDI  | UI   | R1, R1, #-8 | 8                                            |
| <b>→</b> 0     | Mult2           | Yes      | MULTD     |            | R(F2)     | Load1    |          | BNE    |      | R1,R2,loop  | )                                            |
| Register resu  | ılt status      | <u> </u> |           |            |           |          |          |        |      |             |                                              |
| Clock          | R1              |          | F0        | <i>F</i> 2 | F4        | F6       | F8       | F10 I  | F12  | ? F3C       | <u>)                                    </u> |
| 21             | 56              | Qi       | Load3     |            | Mult1     |          |          |        |      |             |                                              |

Mult1 (third iteration MUL.D) completing; what is waiting for it?

Issue fourth iteration MUL.D (to RS Mult2)

EECC551 - Shaaban

| ion          | T             | on  | ıa | su           | lo   | L  | 00         | p     | E    |         | m<br>cle | plo   | e .    | Γiı   | mi       | 'nĮ    | 3 I      | Dia     | ag       | ra    | m     |             |
|--------------|---------------|-----|----|--------------|------|----|------------|-------|------|---------|----------|-------|--------|-------|----------|--------|----------|---------|----------|-------|-------|-------------|
| Iteration    |               | 1   | 2  | 3            | 4    | 5  | 6          | 7     | 8    | 9       | 10       | 11    | 12     | 13    | 14       | 15     | 16       | 17      | 18       | 19    | 20    | 21          |
|              | L.D.          | I   | E  | E            | E    | E  | E          | E     | E    | E       | W        |       |        |       |          |        |          |         |          |       |       |             |
| 1            | MUL.D         |     | I  |              |      |    |            |       |      |         |          | E     | Е      | E     | E        | W      |          |         |          |       |       |             |
|              | S.D.          |     |    | Ι            |      |    |            |       |      |         |          |       |        |       |          | 7      | E        | E       | E        | E     |       |             |
|              | DADDUI        |     |    |              | Ι    |    |            |       |      |         |          |       |        |       |          |        |          |         |          |       |       |             |
|              | BNE           |     |    |              |      | I  |            |       |      |         |          |       |        |       |          |        |          |         |          |       |       |             |
|              | L.D.          |     |    |              |      |    | I          | E     | E    | E       | E        | W     |        |       |          |        |          |         |          |       |       |             |
| 2            | MUL.D         |     |    |              |      |    |            | I     |      |         |          |       | E      | E     | E        | E      | W        |         |          |       |       |             |
| -            | S.D.          |     |    |              |      |    |            |       | Ι    |         |          |       |        |       |          |        |          | E       | E        | E     | E     |             |
|              | <b>DADDUI</b> |     |    |              |      |    |            |       |      | I       |          |       |        |       | <u> </u> | 3rd    | L.D v    | vrite ( | lelaye   | d one | cycle |             |
|              | BNE           |     |    |              |      |    |            |       |      |         | I        |       |        |       |          |        | <u> </u> | <b></b> | <u> </u> |       |       |             |
|              | <u>L.D.</u>   |     |    |              |      |    | <u> </u>   |       |      |         |          | I     | E      | E     | E        | E      |          | W       |          |       |       |             |
|              | MUL.D         |     |    |              |      |    |            |       |      |         |          |       |        |       |          |        | I        |         | E        | E     | E     | E           |
| 3            | <u>S.D.</u>   |     |    |              |      |    | <u> </u>   | 3rd M | UL.D | ) issue | delay    | ed ur | ıtil m | ul RS | is av    | ailabl | е        | I       |          |       |       |             |
|              | <b>DADDUI</b> |     |    |              |      |    |            | -     |      |         |          |       |        |       |          |        |          |         | I        |       |       |             |
| $\mathbf{H}$ | BNE           |     |    |              |      |    |            |       |      |         |          |       |        |       |          |        |          |         |          | I     |       | <del></del> |
|              | <u>L.D.</u>   |     |    |              |      |    |            |       |      |         |          |       |        |       |          |        |          |         |          |       | I     | E           |
|              | MUL.D         |     |    |              |      |    | -          |       |      |         |          |       |        |       |          |        |          |         |          |       |       | I           |
| 4            | <u>S.D.</u>   |     |    |              |      |    |            |       |      |         |          |       |        |       |          |        |          |         |          |       |       |             |
|              | <u>DADDUI</u> |     |    |              |      |    | -          |       |      |         | -        |       |        |       |          |        | -        |         |          |       |       |             |
|              | BNE           |     |    |              |      |    |            |       |      |         |          |       |        |       |          |        |          |         |          |       |       |             |
|              | I = Is        | sue | E  | = <b>E</b> : | xecu | te | <b>W</b> = | Wri   | te R | esult   | on C     | DB    |        |       | <b>E</b> | EC     | 'C'5     | 551     | _ S      | has   | aha   | n l         |