# INTRODUCTION TO ADVANCED PIPELINE

1 May 2020

Dr Noor Mahammad Sk

#### Review: Summary of Pipelining Basics

- Hazards limit performance
  - Structural: need more HW resources
  - Data: need forwarding, compiler scheduling
  - Control: early evaluation & PC, delayed branch, prediction
- Increasing length of pipe increases impact of hazards; pipelining helps instruction bandwidth, not latency
- Interrupts, instruction set, FP makes pipelining harder
- Compilers reduce cost of data and control hazards
  - Load delay slots
  - Branch delay slots
  - Branch prediction
- □ Today: Longer pipelines (R4000) → more instruction level parallelism → SW and HW loop unrolling

### Case Study: MIPS R4000 (200MHz)

- 8 Stage Pipeline:
  - IF—first half of fetching of instruction; PC selection happens here as well as initiation of instruction cache access.
  - IS—second half of access to instruction cache.
  - RF—instruction decode and register fetch, hazard checking and also instruction cache hit detection.
  - EX—execution, which includes effective address calculation, ALU operation, and branch target computation and condition evaluation.
  - DF—data fetch, first half of access to data cache.
  - DS—second half of access to data cache.
  - TC-tag check, determine whether the data cache access hit.
  - WB—write back for loads and register-register operations.
- 8 Stages: What is impact on Load delay? Branch delay? Why?

## Case Study: MIPS R4000

| TWO Cycle<br>Load Latency                                                                                                 | IF | IS<br>IF | RF<br>IS<br>IF | EX<br>RF<br>IS<br>IF | DF<br>EX<br>RF<br>IS<br>IF | DS<br>DF<br>EX<br>RF<br>IS<br>IF | TC<br>DS<br>DF<br>EX<br>RF<br>IS | WB<br>TC<br>DS<br>DF<br>EX<br>RF<br>IS |
|---------------------------------------------------------------------------------------------------------------------------|----|----------|----------------|----------------------|----------------------------|----------------------------------|----------------------------------|----------------------------------------|
| THREE Cycle Branch Latency (conditions evaluated during EX phase) Delay slot plus two stores and branch likely cancels of |    | IS<br>IF | RF<br>IS<br>IF | RF IS                | DF<br>EX<br>RF<br>IS       | DS<br>DF<br>EX<br>RF<br>IS       | TC<br>DS<br>DF<br>EX<br>RF<br>IS | WB TC DS DF EX RF IS IF                |

### MIPS R4000 Floating Point

- FP Adder, FP Multiplier, FP Divider
- Last step of FP Multiplier/Divider uses FP Adder HW
- 8 kinds of stages in FP units:

| Stage | Functional unit | Description                |
|-------|-----------------|----------------------------|
| Α     | FP adder        | Mantissa ADD stage         |
| D     | FP divider      | Divide pipeline stage      |
| Е     | FP multiplier   | Exception test stage       |
| M     | FP multiplier   | First stage of multiplier  |
| N     | FP multiplier   | Second stage of multiplier |
| R     | FP adder        | Rounding stage             |
| S     | FP adder        | Operand shift stage        |
| U     |                 | Unpack FP numbers          |

## MIPS FP Pipe Stages

6

| FP Instr       | 1                          | 2   | <i>3</i> | 4                | 5   | 6    | 7                 | 8                 | •••            |
|----------------|----------------------------|-----|----------|------------------|-----|------|-------------------|-------------------|----------------|
| Add, Subtract  | U                          | S+A | A+R      | R+S              |     |      |                   |                   |                |
| Multiply       | U                          | E+M | M        | M                | M   | N    | N+A               | R                 |                |
| Divide         | U                          | Α   | R        | $D^{28}$         | ••• | D+A  | D+R,              | D+R,              | D+A, D+R, A, R |
| Square root    | U                          | E   | (A+R)    | ) <sup>108</sup> | ••• | Α    | R                 |                   |                |
| Negate         | U                          | S   |          |                  |     |      |                   |                   |                |
| Absolute value | U                          | S   |          |                  |     |      |                   |                   |                |
| FP compare     | U                          | Α   | R        |                  |     |      |                   |                   |                |
| Stages:        |                            |     |          |                  |     |      |                   |                   |                |
| M              | First stage of multiplier  |     |          |                  |     | A    | Man               | tissa ADD stage   |                |
| N              | Second stage of multiplier |     |          |                  | ier | D    | Divid             | de pipeline stage |                |
| R              | Rounding stage             |     |          |                  | E   | Exce | eption test stage |                   |                |
| S              | Operand shift stage        |     |          |                  |     |      |                   |                   |                |
| U              | Unpack FP numbers          |     |          |                  |     |      |                   |                   |                |

#### FP Loop: Where are the Hazards?

| Loop:  | LD         | F0,0(R1)     | ;F0=vector element         |                         |
|--------|------------|--------------|----------------------------|-------------------------|
|        | ADDD       | F4,F0,F2     | ;add scalar from F2        |                         |
|        | SD         | 0(R1),F4     | ;store result              |                         |
|        | SUBI       | R1,R1,8      | ;decrement pointer 8B (DW) |                         |
|        | BNEZ       | R1,Loop      | ;branch R1!=zero           |                         |
|        | NOP        |              | ;delayed branch slot       |                         |
| Instru | uction pro | ducing resul | t Instruction using result | Latency in clock cycles |

| Instruction producing result | Instruction using result | Latency in clock cycles |
|------------------------------|--------------------------|-------------------------|
| FP ALU op                    | Another FP ALU op        | 3                       |
| FP ALU op                    | Store double             | 2                       |
| Load double                  | FP ALU op                | 1                       |
| Load double                  | Store double             | 0                       |
| Integer op                   | Integer op               | 0                       |

Where are the stalls?

## FP Loop Hazards

| Loop: | LD   | F0, 0(R1)  | ;F0=vector element              |
|-------|------|------------|---------------------------------|
|       | ADDD | F4, F0, F2 | ;add scalar from F2             |
|       | SD   | 0(R1), F4  | ;store result                   |
|       | SUBI | R1, R1, 8  | ;decrement pointer 8 Bytes (DW) |
|       | BNEZ | R1, Loop   | ;branch R1!=zero                |
|       | NOP  |            | ;delayed branch slot            |

| Instruction producing result | Instruction using result | Latency in clock cycles |
|------------------------------|--------------------------|-------------------------|
| FP ALU op                    | Another FP ALU op        | 3                       |
| FP ALU op                    | Store double             | 2                       |
| Load double                  | FP ALU op                | 1                       |
| Load double                  | Store double             | 0                       |
| Integer op                   | Integer op               | 0                       |

### **FP Loop Showing Stalls**

| 1 | Loop: | LD    | F0, 0(R1)  | ;F0=vector element          |
|---|-------|-------|------------|-----------------------------|
| 2 |       | Stall |            |                             |
| 3 |       | ADDD  | F4, F0, F2 | F4,F0,F2                    |
| 4 |       | Stall |            |                             |
| 5 |       | Stall |            |                             |
| 6 |       | SD    | 0(R1), F4  | ;store result               |
| 7 |       | SUBI  | R1, R1, 8  | ; decrement pointer 8B (DW) |
| 8 |       | BNEZ  | R1, Loop   | ;branch R1 != zero          |
| 9 |       | Stall |            | ;delayed branch slot        |

| Instruction producing result | Instruction using result | Latency in clock cycles |
|------------------------------|--------------------------|-------------------------|
| FP ALU op                    | Another FP ALU op        | 3                       |
| FP ALU op                    | Store double             | 2                       |
| Load double                  | FP ALU op                | 1                       |

### Revised FP Loop Minimizing Stalls

```
10
                          FO, O(R1)
  1
        Loop:
                  LD
                  Stall
  3
                  ADDD
                          F4, F0, F2
                  SUBI
                          R1, R1, 8
  4
  5
                  BNEZ
                          R1, Loop
                                           ; delayed branch
  6
                          8(R1), F4
                  SD
                                           ; altered when move past SUBI
```

#### Replace BNEZ stall with SD by changing address of SD

| Instruction producing result | Instruction using result | Latency in clock cycles |
|------------------------------|--------------------------|-------------------------|
| FP ALU op                    | Another FP ALU op        | 3                       |
| FP ALU op                    | Store double             | 2                       |
| Load double                  | FP ALU op                | 1                       |

9 clocks: Rewrite code to minimize stalls?

# Unroll Loop Four Times (straightforward way)

```
1 Loop:
                  F0,0(R1)
         LD
                                                          Rewrite loop to
                  F4,F0,F2
2
         ADDD
                                                             minimize stalls?
3
         SD
                  0(R1),F4
                                   ;drop SUBI & BNEZ
4
                  F6,-8(R1)
         LD
5
         ADDD
                  F8,F6,F2
                  -8(R1),F8
         SD
                                   ;drop SUBI & BNEZ
7
         LD
                  F10,-16(R1)
8
        ADDD
                  F12,F10,F2
9
         SD
                  -16(R1),F12
                                   ;drop SUBI & BNEZ
                  F14,-24(R1)
10
         LD
11
         ADDD
                  F16,F14,F2
                  -24(R1),F16
12
         SD
13
         SUBI
                  R1,R1,#32
                                   ;alter to 4*8
14
         BNEZ
                  R1,LOOP
15
         NOP
```

$$15 + 4 \times (1+2) = 27$$
 clock cycles

#### Unrolled Loop That Minimizes Stalls

| 1 Loop: |             | F0,0(R1)    | <ul> <li>What assumptions made</li> </ul> |
|---------|-------------|-------------|-------------------------------------------|
| 2       | LD          | F6,-8(R1)   | when moved code?                          |
| 3       | LD          | F10,-16(R1) |                                           |
| 4       | LD          | F14,-24(R1) | OK to move store past                     |
| 5       | ADDD        | F4,F0,F2    | SUBI even though changes                  |
| 6       | <b>ADDD</b> | F8,F6,F2    | register                                  |
| 7       | ADDD        | F12,F10,F2  | OK to move loads before                   |
| 8       | <b>ADDD</b> | F16,F14,F2  | stores: get right data?                   |
| 9       | SD          | 0(R1),F4    | When is it safe for                       |
| 10      | SD          | -8(R1),F8   | compiler to do such                       |
| 11      | SD          | -16(R1),F12 | changes?                                  |
| 12      | SUBI        | R1,R1,#32   | <b>O</b>                                  |
| 13      | <b>BNEZ</b> | R1,LOOP     |                                           |
| 14      | SD          | 8(R1),F16   | ; 8-32 = -24                              |

#### 14 clock cycles,

When safe to move instructions?

- Definitions: compiler concerned about dependencies in program, whether or not a HW hazard depends on a given pipeline
- Try to schedule to avoid hazards
- □ (True) Data dependencies (RAW if a hazard for HW)
  - Instruction i produces a result used by instruction j, or
  - Instruction j is data dependent on instruction k, and instruction k is data dependent on instruction i.
- If dependent, can't execute in parallel
- Easy to determine for registers (fixed names)
- □ Hard for memory:
  - $\square$  Does 100(R4) = 20(R6)?
  - □ From different loop iterations, does 20(R6) = 20(R6)?

### Where are the data dependencies?

```
1 Loop: LD F0,0(R1)
2 ADDD F4,F0,F2
3 SUBI R1,R1,8
4 BNEZ R1,Loop ;delayed branch
5 SD 8(R1),F4 ;altered when move past SUBI
```

1 May 2020 Dr Noor Mahammad Sk

- Another kind of dependence called name dependence: two instructions use same name (register or memory location) but don't exchange data
- Antidependence (WAR if a hazard for HW)
  - Instruction j writes a register or memory location that instruction i reads from and instruction i is executed first
- Output dependence (WAW if a hazard for HW)
  - Instruction i and instruction j write the same register or memory location; ordering between instructions must be preserved.

#### Where are the name dependencies?

```
1 Loop: LD
               F0,0(R1)
       ADDD
               F4,F0,F2
2
       SD
               O(R1),F4 ;drop SUBI & BNEZ
3
       LD
               F0,-8(R1)
2
       ADDD
               F4,F0,F2
       SD
               -8(R1),F4 ;drop SUBI & BNEZ
       LD
               F0,-16(R1)
       ADDD
               F4,F0,F2
       SD
               -16(R1),F4 ;drop SUBI & BNEZ
10
       LD
               F0,-24(R1)
11
       ADDD
               F4,F0,F2
       SD
               -24(R1),F4
12
       SUBI
               R1,R1,#32 ;alter to 4*8
13
       BNEZ
               R1,LOOP
14
15
       NOP
```

1 May 2020

How can remove them?

#### Where are the name dependencies?

17

```
1 Loop: LD
               F0,0(R1)
       ADDD
               F4,F0,F2
2
       SD
               0(R1),F4
3
               F0,-8(R1)
       LD
4
       ADDD
               F4,F0,F2
2
3
       SD
               -8(R1),F4
       LD
               F0,-16(R1)
7
       ADDD
               F4,F0,F2
8
       SD
               -16(R1),F4
9
       LD
               F0,-24(R1)
10
       ADDD
11
               F4,F0,F2
               -24(R1),F4
       SD
12
       SUBI
               R1,R1,#32
13
               R1,LOOP
       BNEZ
14
15
       NOP
How can remove them?
```

```
1 Loop: LD
               F0,0(R1)
        ADDD
               F4,F0,F2
2
        SD
               0(R1),F4
3
        LD
               F6,-8(R1)
4
        ADDD
               F8,F6,F2
        SD
               -8(R1),F8
6
        LD
               F10,-16(R1)
        ADDD
8
               F12,F10,F2
        SD
               -16(R1),F12
9
        LD
               F14,-24(R1)
10
        ADDD
               F16,F14,F2
11
        SD
               -24(R1),F16
12
        SUBI
               R1,R1,#32
13
        BNEZ
               R1,LOOP
14
        NOP
15
Called "register renaming"
```

- Again Name Dependencies are Hard for Memory Accesses
  - $\Box$  Does 100(R4) = 20(R6)?
  - □ From different loop iterations, does 20(R6) = 20(R6)?
- Our example required compiler to know that if R1 doesn't change then:

$$0(R1) \neq -8(R1) \neq -16(R1) \neq -24(R1)$$

There were no dependencies between some loads and stores so they could be moved by each other

- □ Final kind of dependence called control dependence
- Example

```
if p1 {S1;};
if p2 {S2;};
```

S1 is control dependent on p1 and S2 is control dependent on p2 but not on p1.

- □ Two (obvious) constraints on control dependences:
  - An instruction that is control dependent on a branch cannot be moved before the branch so that its execution is no longer controlled by the branch.
  - An instruction that is not control dependent on a branch cannot be moved to after the branch so that its execution is controlled by the branch.
- Control dependencies relaxed to get parallelism; get same effect if preserve order of exceptions (address in register checked by branch before use) and data flow (value in register depends on branch)

#### Where are the control dependencies?

| 1 Loop: | LD          | F0,0(R1) |
|---------|-------------|----------|
| 2       | <b>ADDD</b> | F4,F0,F2 |
| 3       | SD          | 0(R1),F4 |
| 4       | SUBI        | R1,R1,8  |
| 5       | <b>BEQZ</b> | R1,exit  |
| 6       | LD          | F0,0(R1) |
| 7       | <b>ADDD</b> | F4,F0,F2 |
| 8       | SD          | 0(R1),F4 |
| 9       | SUBI        | R1,R1,8  |
| 10      | <b>BEQZ</b> | R1,exit  |
| 11      | LD          | F0,0(R1) |
| 12      | <b>ADDD</b> | F4,F0,F2 |
| 13      | SD          | 0(R1),F4 |
| 14      | SUBI        | R1,R1,8  |
| 15      | BEQZ        | R1,exit  |
|         |             |          |

### When Safe to Unroll Loop?

- Example: Where are data dependencies? (A,B,C distinct & nonoverlapping) for (i=1; i<=100; i=i+1) { A[i+1] = A[i] + C[i]; /\* S1 \*/ B[i+1] = B[i] + A[i+1];} /\* S2 \*/
  - 1. S2 uses the value, A[i+1], computed by S1 in the same iteration.
  - 2. S1 uses a value computed by S1 in an earlier iteration, since iteration i computes A[i+1] which is read in iteration i+1. The same is true of S2 for B[i] and B[i+1].
  - This is a "loop-carried dependence": between iterations
- Implies that iterations are dependent, and can't be executed in parallel
- Not the case for our prior example; each iteration was distinct

#### HW Schemes: Instruction Parallelism

- Why in HW at run time?
  - Works when can't know real dependence at compile time
  - Compiler simpler
  - Code for one machine runs well on another
- Key idea: Allow instructions behind stall to proceed

```
DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F12,F8,F14
```

- Enables out-of-order execution → out-of-order completion
- □ ID stage checked both for structural Scoreboard dates to CDC 6600 in 1963

#### HW Schemes: Instruction Parallelism

- Out-of-order execution divides ID stage:
  - 1.Issue—decode instructions, check for structural hazards
  - 2. Read operands—wait until no data hazards, then read operands
- Scoreboards allow instruction to execute whenever 1
   & 2 hold, not waiting for prior instructions
- CDC 6600: In order issue, out of order execution, out of order commit (also called completion)

#### Scoreboard Implications

- □ Out-of-order completion → WAR, WAW hazards?
- Solutions for WAR
  - Queue both the operation and copies of its operands
  - Read registers only during Read Operands stage
- For WAW, must detect hazard: stall until other completes
- □ Need to have multiple instructions in execution phase → multiple execution units or pipelined execution units
- Scoreboard keeps track of dependencies, state or operations
- Scoreboard replaces ID, EX, WB with 4 stages

#### Four Stages of Scoreboard Control

- 1. Issue—decode instructions & check for structural hazards (ID1)

  If a functional unit for the instruction is free and no other active instruction has the same destination register (WAW), the scoreboard issues the instruction to the functional unit and updates its internal data structure. If a structural or WAW hazard exists, then the instruction issue stalls, and no further instructions will issue until these hazards are cleared.
- 2. Read operands—wait until no data hazards, then read operands (ID2)

A source operand is available if no earlier issued active instruction is going to write it, or if the register containing the operand is being written by a currently active functional unit. When the source operands are available, the scoreboard tells the functional unit to proceed to read the operands from the registers and begin execution. The scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order.

### Four Stages of Scoreboard Control

#### 3. Execution—operate on operands (EX)

The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution.

#### 4. Write result—finish execution (WB)

Once the scoreboard is aware that the functional unit has completed execution, the scoreboard checks for WAR hazards. If none, it writes results. If WAR, then it stalls the instruction.

#### Example:

DIVD F0,F2,F4 ADDD F10,F0,F8

SUBD **F8**,F8,F14

CDC 6600 scoreboard would stall SUBD until ADDD reads operands

#### Three Parts of the Scoreboard

- 1. Instruction status—which of 4 steps the instruction is in
- 2. Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each functional unit

```
Busy—Indicates whether the unit is busy or not
```

Op—Operation to perform in the unit (e.g., + or –)

Fi—Destination register

Fj, Fk—Source-register numbers

Qj, Qk—Functional units producing source registers Fj, Fk

Rj, Rk—Flags indicating when Fj, Fk are ready

3. Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register

#### Detailed Scoreboard Pipeline Control

| Instruction status | Wait until                                                                             | Bookkeeping                                                                                                                                                  |  |  |  |  |
|--------------------|----------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| Issue              | Not busy (FU) and not result(D)                                                        | Busy(FU)← yes; Op(FU)← op;<br>Fi(FU)← `D'; Fj(FU)← `S1';<br>Fk(FU)← `S2'; Qj← Result('S1');<br>Qk← Result(`S2'); Rj← not Qj;<br>Rk← not Qk; Result('D')← FU; |  |  |  |  |
| Read operands      | Rj and Rk                                                                              | Rj← No; Rk← No                                                                                                                                               |  |  |  |  |
| Execution complete | Functional unit done                                                                   |                                                                                                                                                              |  |  |  |  |
| Write result       | $\forall$ f((Fj(f) $\neq$ Fi(FU) or Rj(f) = No) & (Fk(f) $\neq$ Fi(FU) or Rk(f) = No)) | ∀f(if Qj(f)=FU then Rj(f)← Yes);<br>∀f(if Qk(f)=FU then Rj(f)← Yes);<br>Result(Fi(FU))← 0; Busy(FU)← No                                                      |  |  |  |  |

### Scoreboard Example

#### FP Add latency = 2 clocks, Multiply = 10, Divide = 40

| Instruction s | <u>status</u> |            |              | Read   | Execut   | ti Write |    |          |          |     |     |
|---------------|---------------|------------|--------------|--------|----------|----------|----|----------|----------|-----|-----|
| Instruction   | j             | k          | <u>Issue</u> | operar | ı comple | t Result | -  |          |          |     |     |
| LD F6         | 34+           | R2         |              |        |          |          |    |          |          |     |     |
| LD F2         | 45+           | R3         |              |        |          |          |    |          |          |     |     |
| MULT  FO      | F2            | F4         |              |        |          |          |    |          |          |     |     |
| SUBD F8       | F6            | F2         |              |        |          |          |    |          |          |     |     |
| DIVD F10      | F0            | F6         |              |        |          |          |    |          |          |     |     |
| ADDD F6       | F8            | F2         |              |        |          |          |    |          |          |     |     |
| Functional u  | ınit sta      | <u>tus</u> |              |        | dest     | S1       | S2 | FU for j | FU for k | Fj? | Fk? |
| Time          | Nam           | e          | Busy         | Ор     | Fi       | Fj       | Fk | Qj       | Qk       | Rj  | Rk  |
|               | Integ         | jer        | No           |        |          |          |    |          |          |     |     |
|               | Mult          | 1          | No           |        |          |          |    |          |          |     |     |
|               | Mult          | 2          | No           |        |          |          |    |          |          |     |     |
|               | Add           |            | No           |        |          |          |    |          |          |     |     |
|               | Divid         | le         | No           |        |          |          |    |          |          |     |     |
| Register res  | ult sta       | <u>tus</u> |              |        |          |          |    |          |          |     |     |
| Clock         |               |            | F0           | F2     | F4       | F6       | F8 | F10      | F12      |     | F30 |
|               |               | FU         |              |        |          |          |    |          |          |     |     |



| Instruction s | status   | -           |       | Read       | Executi   | c Write |            |          |          |     |     |
|---------------|----------|-------------|-------|------------|-----------|---------|------------|----------|----------|-----|-----|
| Instruction   | j        | k           | Issue | operan     | a comple: | t Resu  | <u>l</u> t |          |          |     |     |
| LD F6         | 34+      | R2          | 1     | 2          |           |         |            |          |          |     |     |
| LD F2         | 45+      | R3          | l     |            |           |         |            |          |          |     |     |
| MULT F0       | F2       | F4          |       |            |           |         |            |          |          |     |     |
| SUBD F8       | F6       | F2          |       |            |           |         |            |          |          |     |     |
| DIVD F10      | F0       | F6          |       |            |           |         |            |          |          |     |     |
| ADDD F6       | F8       | F2          |       |            |           |         |            |          |          |     |     |
| Functional u  | unit sta | <u>atus</u> |       |            | dest      | S1      | S2         | FU for j | FU for k | Fj? | Fk? |
| Time          | Nam      | e           | Busy  | Ор         | Fi        | Fj      | Fk         | Qj       | Qk       | Rj  | Rk  |
|               | Integ    | er          | Yes   | Load       | F6        |         | R2         |          |          |     | Yes |
|               | Mult     | 1           | No    |            |           |         |            |          |          |     |     |
|               | Mult2    | 2           | No    |            |           |         |            |          |          |     |     |
|               | Add      |             | No    |            |           |         |            |          |          |     |     |
|               | Divid    | е           | No    |            |           |         |            |          |          |     |     |
| Register res  | sult sta | <u>atus</u> |       |            |           |         |            |          |          |     |     |
| Clock         |          |             | F0    | <i>F</i> 2 | F4        | F6      | F8         | F10      | F12      |     | F30 |
| 2             |          | FU          |       |            |           | Integ   | er         |          |          |     |     |

| Instruction  | status   |      |       | Read    | Execution | Write | ı  |          |          |     |     |
|--------------|----------|------|-------|---------|-----------|-------|----|----------|----------|-----|-----|
| Instruction  | i        | k    | Issue | operand | a complet | Resu  | lt |          |          |     |     |
| LD F6        | 34+      | R2   | 1     | 2       | 3         |       | ]  |          |          |     |     |
| LD F2        | 45+      | R3   |       |         |           |       |    |          |          |     |     |
| MULT F0      | F2       | F4   |       |         |           |       |    |          |          |     |     |
| SUBD F8      | F6       | F2   |       |         |           |       |    |          |          |     |     |
| DIVD F10     | F0       | F6   |       |         |           |       |    |          |          |     |     |
| ADDD F6      | F8       | F2   |       |         |           |       |    |          |          |     |     |
| Functional u | unit sta | atus |       |         | dest      | S1    | S2 | FU for j | FU for k | Fj? | Fk? |
| Time         | Nam      | e    | Busy  | Ор      | Fi        | Fj    | Fk | Qj       | Qk       | Řj  | Rk  |
|              | Integ    | jer  | Yes   | Load    | F6        |       | R2 |          |          |     | Yes |
|              | Mult     | 1    | No    |         |           |       |    |          |          |     |     |
|              | Mult2    | 2    | No    |         |           |       |    |          |          |     |     |
|              | Add      |      | No    |         |           |       |    |          |          |     |     |
|              | Divid    | le   | No    |         |           |       |    |          |          |     |     |
| Register res | sult sta | atus |       |         |           |       |    |          |          |     |     |
| Clock        |          |      | F0    | F2      | F4        | F6    | F8 | F10      | F12      |     | F30 |
| 3            |          | FU   |       |         |           | Integ | er |          |          |     |     |

```
Instruction status
                     Read Execu Write
Instructio i
             K
                Issue operal compl Result
    F6 34+ R2
                             3
                                  4
        45+ R3
        F2
MUI FO
            F4
SUB F8
       F6
            F2
DIVEF10 FO
            F6
ADD F6 F8
           F2
                                 S1 S2 FU for FU for Fj?
Functional unit status
                           dest
                                                          Fk?
    Tim Name
                           Fi
                                               Qk
                                                          Rk
                Busy Op
                                     Fk
                                         Qi
                                                     Ri
                Yes Load
                           F6
                                     R2
                                                          Yes
        Integer
        Mult1
                No
        Mult2
                No
        Add
                No
        Divide
                No
Register result status
Clock
                                 F6 F8 F10 F12 ...
                 F0 F2 F4
                                                          F30
            FU
  4
                                 Integer
```

| Instruction state |               | loous | Read    | Execution |    |         |          |          |     |     |
|-------------------|---------------|-------|---------|-----------|----|---------|----------|----------|-----|-----|
| Instruction j     | . k           | Issue |         | complet   |    | IT<br>I |          |          |     |     |
| LD F6 34          | l+ R2         | 1     | 2       | 3         | 4  |         |          |          |     |     |
| LD F2 45          | 5+ R3         | 5     |         |           |    |         |          |          |     |     |
| MULTIFO F2        | 2 F4          |       |         |           |    |         |          |          |     |     |
| SUBD F8 F6        | 6 F2          |       |         |           |    |         |          |          |     |     |
| DIVD F10 F0       | ) F6          |       |         |           |    |         |          |          |     |     |
| ADDD F6 F8        | 3 F2          |       |         |           |    |         |          |          |     |     |
| Functional unit   | <u>status</u> |       |         | dest      | S1 | S2      | FU for j | FU for k | Fj? | Fk? |
| Time Na           | ame           | Busy  | Ор      | Fi        | Fj | Fk      | Qj       | Qk       | Rj  | Rk  |
| In                | teger         | Yes   | Load    | F2        |    | R3      |          |          |     | Yes |
| M                 | ult1          | No    |         |           |    |         |          |          |     |     |
| M                 | ult2          | No    |         |           |    |         |          |          |     |     |
| Ac                | dd            | No    |         |           |    |         |          |          |     |     |
| Di                | vide          | No    |         |           |    |         |          |          |     |     |
| Register result   | <u>status</u> |       |         |           |    |         |          |          |     |     |
| Clock             |               | F0    | F2      | F4        | F6 | F8      | F10      | F12      |     | F30 |
| 5                 | FU            |       | Integer |           |    |         |          |          |     |     |

| Instruction s | status   | •    |       | Read    | Execution |      |           |          |          |     |     |
|---------------|----------|------|-------|---------|-----------|------|-----------|----------|----------|-----|-----|
| Instruction   | j        | k    | Issue | operand | complet   | Resu | <u>It</u> |          |          |     |     |
| LD F6         | 34+      | R2   | 1     | 2       | 3         | 4    |           |          |          |     |     |
| LD F2         | 45+      | R3   | 5     | 6       |           |      |           |          |          |     |     |
| MULT F0       | F2       | F4   | 6     |         |           |      |           |          |          |     |     |
| SUBD F8       | F6       | F2   |       |         |           |      |           |          |          |     |     |
| DIVD F10      | F0       | F6   |       |         |           |      |           |          |          |     |     |
| ADDD F6       | F8       | F2   |       |         |           |      |           |          |          |     |     |
| Functional u  | unit sta | atus |       |         | dest      | S1   | S2        | FU for j | FU for k | Fj? | Fk? |
| Time          | Nam      | e    | Busy  | Ор      | Fi        | Fj   | Fk        | Qj       | Qk       | Rj  | Rk  |
|               | Integ    | er   | Yes   | Load    | F2        |      | R3        | -        |          | -   | Yes |
|               | Mult'    | 1    | Yes   | Mult    | F0        | F2   | F4        | Integer  |          | No  | Yes |
|               | Mult2    | 2    | No    |         |           |      |           | _        |          |     |     |
|               | Add      |      | No    |         |           |      |           |          |          |     |     |
|               | Divid    | е    | No    |         |           |      |           |          |          |     |     |
| Register res  | sult sta | atus |       |         |           |      |           |          |          |     |     |
| Clock         |          |      | F0    | F2      | F4        | F6   | F8        | F10      | F12      |     | F30 |
| 6             |          | FU   | Mult1 | Integer |           |      |           |          |          |     |     |

1 May 2020

| <u>Instruction</u>     | status  | _           |       | Read       | Executi   | (Write |           |          |          |     |     |
|------------------------|---------|-------------|-------|------------|-----------|--------|-----------|----------|----------|-----|-----|
| Instruction            | j       | k           | Issue | operand    | a complet | Resu   | <u>Įt</u> |          |          |     |     |
| LD F6                  | 34+     | R2          | 1     | 2          | 3         | 4      |           |          |          |     |     |
| LD F2                  | 45+     | R3          | 5     | 6          | 7         |        |           |          |          |     |     |
| MULT F0                | F2      | F4          | 6     |            |           |        |           |          |          |     |     |
| SUBD F8                | F6      | F2          | 7     |            |           |        |           |          |          |     |     |
| DIVD F10               | F0      | F6          |       |            |           |        |           |          |          |     |     |
| ADDD F6                | F8      | F2          |       |            |           |        |           |          |          |     |     |
| <b>Functiona</b>       | unit st | <u>atus</u> |       |            | dest      | S1     | S2        | FU for j | FU for k | Fj? | Fk? |
| Time Name              |         |             | Busy  | Ор         | Fi        | Fj     | Fk        | Qj       | Qk       | Rj  | Rk  |
|                        | Integ   | ger         | Yes   | Load       | F2        |        | R3        |          |          |     | Yes |
|                        | Mult    | 1           | Yes   | Mult       | F0        | F2     | F4        | Integer  |          | No  | Yes |
|                        | Mult    | 2           | No    |            |           |        |           |          |          |     |     |
|                        | Add     |             | Yes   | Sub        | F8        | F6     | F2        |          | Integer  | Yes | No  |
|                        | Divid   | le          | No    |            |           |        |           |          |          |     |     |
| Register result status |         |             |       |            |           |        |           |          |          |     |     |
| Clock                  |         |             | F0    | <i>F</i> 2 | F4        | F6     | F8        | F10      | F12      |     | F30 |
| 7                      |         | FU          | Mult1 | Integer    |           |        | Add       |          |          |     |     |

| <u>Instru</u> | ction s                | status  |             |       | Read       | Execution | Write |           |          |          |     |     |
|---------------|------------------------|---------|-------------|-------|------------|-----------|-------|-----------|----------|----------|-----|-----|
| Instru        | ction                  | j       | k           | Issue | operand    | complet   | Resu  | <i>lt</i> |          |          |     |     |
| LD            | F6                     | 34+     | R2          | 1     | 2          | 3         | 4     |           |          |          |     |     |
| LD            | F2                     | 45+     | R3          | 5     | 6          | 7         |       |           |          |          |     |     |
| MULT          | ΠFO                    | F2      | F4          | 6     |            |           |       |           |          |          |     |     |
| SUBE          | ) F8                   | F6      | F2          | 7     |            |           |       |           |          |          |     |     |
| DIVD          | F10                    | F0      | F6          | 8     |            |           |       |           |          |          |     |     |
| ADDE          | DF6                    | F8      | F2          |       |            |           |       |           |          |          |     |     |
| <u>Funct</u>  | Functional unit status |         |             |       |            | dest      | S1    | S2        | FU for j | FU for k | Fj? | Fk? |
|               | Time Name              |         | Busy        | Ор    | Fi         | Fj        | Fk    | Qj        | Qk       | Řj       | Rk  |     |
|               |                        | Integ   | er          | Yes   | Load       | F2        |       | R3        |          |          |     | Yes |
|               |                        | Mult1   |             | Yes   | Mult       | F0        | F2    | F4        | Integer  |          | No  | Yes |
|               |                        | Mult2   | 2           | No    |            |           |       |           |          |          |     |     |
|               |                        | Add     |             | Yes   | Sub        | F8        | F6    | F2        |          | Integer  | Yes | No  |
|               |                        | Divid   | е           | Yes   | Div        | F10       | F0    | F6        | Mult1    |          | No  | Yes |
| Regis         | ster res               | ult sta | <u>ıtus</u> |       |            |           |       |           |          |          |     |     |
| Cloc          | ck                     |         |             | F0    | <i>F</i> 2 | F4        | F6    | F8        | F10      | F12      |     | F30 |
| 8             |                        |         | FU          | Mult1 | Integer    |           |       | Add       | Divide   |          |     |     |

| <u>Instruc</u>  | tion s                 | tatus_ |      |       | Read    | Execution | Write |           |          |          |     |     |
|-----------------|------------------------|--------|------|-------|---------|-----------|-------|-----------|----------|----------|-----|-----|
| Instruct        | tion                   | j      | k    | Issue | operand | complet   | Resu  | <i>lt</i> |          |          |     |     |
| LD I            | F6                     | 34+    | R2   | 1     | 2       | 3         | 4     |           |          |          |     |     |
| LD I            | F2                     | 45+    | R3   | 5     | 6       | 7         | 8     |           |          |          |     |     |
| MULTH           | F0                     | F2     | F4   | 6     |         |           |       |           |          |          |     |     |
| SUBD            | F8                     | F6     | F2   | 7     |         |           |       |           |          |          |     |     |
| DIVD            | F10                    | F0     | F6   | 8     |         |           |       |           |          |          |     |     |
| ADDD            | F6                     | F8     | F2   |       |         |           |       |           |          |          |     |     |
| <u>Function</u> | Functional unit status |        |      |       |         | dest      | S1    | S2        | FU for j | FU for k | Fj? | Fk? |
| Time Name       |                        | е      | Busy | Ор    | Fi      | Fj        | Fk    | Qj        | Qk       | Rj       | Rk  |     |
|                 |                        | Integ  | er   | No    |         |           |       |           |          |          |     |     |
|                 |                        | Mult1  |      | Yes   | Mult    | F0        | F2    | F4        |          |          | Yes | Yes |
|                 |                        | Mult2  | 2    | No    |         |           |       |           |          |          |     |     |
|                 |                        | Add    |      | Yes   | Sub     | F8        | F6    | F2        |          |          | Yes | Yes |
|                 |                        | Divid  | е    | Yes   | Div     | F10       | F0    | F6        | Mult1    |          | No  | Yes |
| Registe         | Register result status |        |      |       |         |           |       |           |          |          |     |     |
| Clock           | <b>(</b>               |        |      | F0    | F2      | F4        | F6    | F8        | F10      | F12      |     | F30 |
| 8               |                        |        | FU   | Mult1 |         |           |       | Add       | Divide   |          |     |     |

| <u>Instructio</u> | n status               | <u>;                                    </u> |       | Read   | Executi  | ic Write |            |          |          |     |     |
|-------------------|------------------------|----------------------------------------------|-------|--------|----------|----------|------------|----------|----------|-----|-----|
| Instructio        | n <i>j</i>             | k                                            | Issue | operan | a comple | t Resu   | <u>l</u> t |          |          |     |     |
| LD F6             | 34+                    | R2                                           | 1     | 2      | 3        | 4        |            |          |          |     |     |
| LD F2             | 45+                    | R3                                           | 5     | 6      | 7        | 8        |            |          |          |     |     |
| MULT  FO          | F2                     | F4                                           | 6     | 9      |          |          |            |          |          |     |     |
| SUBD F8           | F6                     | F2                                           | 7     | 9      |          |          |            |          |          |     |     |
| DIVD F1           | 0 F0                   | F6                                           | 8     |        |          |          |            |          |          |     |     |
| ADDD F6           | F8                     | F2                                           |       |        |          |          |            |          |          |     |     |
| <b>Function</b>   | al unit st             | <u>atus</u>                                  |       |        | dest     | S1       | S2         | FU for j | FU for k | Fj? | Fk? |
| Time Name         |                        |                                              | Busy  | Ор     | Fi       | Fj       | Fk         | Qj       | Qk       | Rj  | Rk  |
|                   | Inte                   | ger                                          | No    |        |          |          |            |          |          |     |     |
|                   | 10 Mult                | 1                                            | Yes   | Mult   | F0       | F2       | F4         |          |          | Yes | Yes |
|                   | Mult                   | 2                                            | No    |        |          |          |            |          |          |     |     |
|                   | 2 Add                  |                                              | Yes   | Sub    | F8       | F6       | F2         |          |          | Yes | Yes |
|                   | Divid                  | de                                           | Yes   | Div    | F10      | F0       | F6         | Mult1    |          | No  | Yes |
| Register          | Register result status |                                              |       |        |          |          |            |          |          |     |     |
| Clock             |                        |                                              | F0    | F2     | F4       | F6       | F8         | F10      | F12      |     | F30 |
| 9 <i>FU</i>       |                        | Mult1                                        | _     | _      |          | Add      | Divide     |          |          |     |     |

1 Read 2 operands for MULT & SUBD? Issue ADDD? Dr Noor Mahammad Sk

| <u>Instruction</u> | status                            | _   |       | Read   | Execution               | Write |           |          |          |     |     |
|--------------------|-----------------------------------|-----|-------|--------|-------------------------|-------|-----------|----------|----------|-----|-----|
| Instruction        | j                                 | k   | Issue | operan | a <mark>comple</mark> t | Resu  | <u>lt</u> |          |          |     |     |
| LD F6              | 34+                               | R2  | 1     | 2      | 3                       | 4     |           |          |          |     |     |
| LD F2              | 45+                               | R3  | 5     | 6      | 7                       | 8     |           |          |          |     |     |
| MULT F0            | F2                                | F4  | 6     | 9      |                         |       |           |          |          |     |     |
| SUBD F8            | F6                                | F2  | 7     | 9      | 11                      |       |           |          |          |     |     |
| DIVD F10           | F0                                | F6  | 8     |        |                         |       |           |          |          |     |     |
| ADDD F6            | F8                                | F2  |       |        |                         |       |           |          |          |     |     |
| <u>Functional</u>  | Functional unit status            |     |       |        | dest                    | S1    | S2        | FU for j | FU for k | Fj? | Fk? |
| Tim                | Time Name                         |     | Busy  | Ор     | Fi                      | Fj    | Fk        | Qj       | Qk       | Rj  | Rk  |
|                    | Integ                             | ger | No    |        |                         |       |           |          |          |     |     |
|                    | 8 Mult                            | 1   | Yes   | Mult   | F0                      | F2    | F4        |          |          | Yes | Yes |
|                    | Mult                              | 2   | No    |        |                         |       |           |          |          |     |     |
|                    | 0 Add                             |     | Yes   | Sub    | F8                      | F6    | F2        |          |          | Yes | Yes |
|                    | Divid                             | le  | Yes   | Div    | F10                     | F0    | F6        | Mult1    |          | No  | Yes |
| Register re        | Divide (1) Register result status |     |       |        |                         |       |           |          |          |     |     |
| Clock              |                                   |     | F0    | F2     | F4                      | F6    | F8        | F10      | F12      |     | F30 |
| 11                 |                                   | FU  | Mult1 |        | -                       |       | Add       | Divide   |          |     |     |

| Instruction s          | <u>tatus</u> |    |       | Read    | Execution | Write |          |          |     |     |     |
|------------------------|--------------|----|-------|---------|-----------|-------|----------|----------|-----|-----|-----|
| Instruction            | j            | k  | Issue | operand | complet   | Resu  | lt       |          |     |     |     |
| LD F6                  | 34+          | R2 | 1     | 2       | 3         | 4     |          |          |     |     |     |
| LD F2                  | 45+          | R3 | 5     | 6       | 7         | 8     |          |          |     |     |     |
| MULT F0                | F2           | F4 | 6     | 9       |           |       |          |          |     |     |     |
| SUBD F8                | F6           | F2 | 7     | 9       | 11        | 12    |          |          |     |     |     |
| DIVD F10               | F0           | F6 | 8     |         |           |       |          |          |     |     |     |
| ADDD F6                | F8           | F2 |       |         |           |       |          |          |     |     |     |
| Functional u           | atus         |    |       | dest    | S1        | S2    | FU for j | FU for k | Fj? | Fk? |     |
| Time Name              |              | e  | Busy  | Ор      | Fi        | Fj    | Fk       | Qj       | Qk  | Rj  | Rk  |
|                        | Integ        | er | No    |         |           |       |          |          |     |     |     |
| 7                      | Mult1        | 1  | Yes   | Mult    | F0        | F2    | F4       |          |     | Yes | Yes |
|                        | Mult2        | 2  | No    |         |           |       |          |          |     |     |     |
|                        | Add          |    | No    |         |           |       |          |          |     |     |     |
|                        | Divide       |    | Yes   | Div     | F10       | F0    | F6       | Mult1    |     | No  | Yes |
| Register result status |              |    |       |         |           |       |          |          |     |     |     |
| Clock                  |              |    | F0    | F2      | F4        | F6    | F8       | F10      | F12 |     | F30 |
| 12                     |              |    | Mult1 |         |           |       |          | Divide   |     |     |     |

| Instruction      | status                 | _           |       | Read   | Execut                | ic Write |            |          |          |     |     |
|------------------|------------------------|-------------|-------|--------|-----------------------|----------|------------|----------|----------|-----|-----|
| Instruction      | n <i>j</i>             | k           | Issue | operan | a <mark>comple</mark> | t Resu   | <u>l</u> t |          |          |     |     |
| LD F6            | 34+                    | R2          | 1     | 2      | 3                     | 4        |            |          |          |     |     |
| LD F2            | 45+                    | R3          | 5     | 6      | 7                     | 8        |            |          |          |     |     |
| MULT F0          | F2                     | F4          | 6     | 9      |                       |          |            |          |          |     |     |
| SUBD F8          | F6                     | F2          | 7     | 9      | 11                    | 12       |            |          |          |     |     |
| DIVD F10         | ) F0                   | F6          | 8     |        |                       |          |            |          |          |     |     |
| ADDD F6          | F8                     | F2          | 13    |        |                       |          |            |          |          |     |     |
| <b>Functiona</b> | l unit st              | <u>atus</u> | ,     |        | dest                  | S1       | S2         | FU for j | FU for k | Fj? | Fk? |
| Tin              | Time Name              |             | Busy  | Ор     | Fi                    | Fj       | Fk         | Qj       | Qk       | Rj  | Rk  |
|                  | Integ                  | ger         | No    |        |                       |          |            |          |          |     |     |
|                  | 6 Mult                 | 1           | Yes   | Mult   | F0                    | F2       | F4         |          |          | Yes | Yes |
|                  | Mult                   | 2           | No    |        |                       |          |            |          |          |     |     |
|                  | Add                    |             | Yes   | Add    | F6                    | F8       | F2         |          |          | Yes | Yes |
|                  | Divid                  | le          | Yes   | Div    | F10                   | F0       | F6         | Mult1    |          | No  | Yes |
| Register r       | Register result status |             |       |        |                       |          |            |          |          |     |     |
| Clock            |                        |             | F0    | F2     | F4                    | F6       | F8         | F10      | F12      |     | F30 |
| 13               |                        | FU          | Mult1 |        |                       | Add      |            | Divide   |          |     |     |

| Instruction s                    | status      | _  |       | Read   | Executi  | ic Write |            |          |     |     |     |
|----------------------------------|-------------|----|-------|--------|----------|----------|------------|----------|-----|-----|-----|
| Instruction                      | j           | k  | Issue | operan | a comple | t Resu   | <u>I</u> t |          |     |     |     |
| LD F6                            | 34+         | R2 | 1     | 2      | 3        | 4        |            |          |     |     |     |
| LD F2                            | 45+         | R3 | 5     | 6      | 7        | 8        |            |          |     |     |     |
| MULT F0                          | F2          | F4 | 6     | 9      |          |          |            |          |     |     |     |
| SUBD F8                          | F6          | F2 | 7     | 9      | 11       | 12       |            |          |     |     |     |
| DIVD F10                         | F0          | F6 | 8     |        |          |          |            |          |     |     |     |
| ADDD F6                          | F8          | F2 | 13    | 14     |          |          |            |          |     |     |     |
| <u>Functional ι</u>              | <u>atus</u> | ,  |       | dest   | S1       | S2       | FU for j   | FU for k | Fj? | Fk? |     |
| Functional unit status Time Name |             | е  | Busy  | Ор     | Fi       | Fj       | Fk         | Qj       | Qk  | Rj  | Rk  |
|                                  | Integ       | er | No    |        |          |          |            |          |     |     |     |
| 5                                | Mult'       | 1  | Yes   | Mult   | F0       | F2       | F4         |          |     | Yes | Yes |
|                                  | Mult2       | 2  | No    |        |          |          |            |          |     |     |     |
| 2                                | Add         |    | Yes   | Add    | F6       | F8       | F2         |          |     | Yes | Yes |
|                                  | Divid       | е  | Yes   | Div    | F10      | F0       | F6         | Mult1    |     | No  | Yes |
| Register res                     | atus        |    |       |        |          |          |            |          |     |     |     |
| Clock                            |             |    | F0    | F2     | F4       | F6       | F8         | F10      | F12 |     | F30 |
| 14                               |             | FU | Mult1 |        |          | Add      |            | Divide   |     |     |     |

| <u>Instru</u> | ction s                | status |     |       | Read    | Execution | Write |           |          |          |     |     |
|---------------|------------------------|--------|-----|-------|---------|-----------|-------|-----------|----------|----------|-----|-----|
| Instru        | ction                  | j      | k   | Issue | operand | complet   | Resu  | <u>It</u> |          |          |     |     |
| LD            | F6                     | 34+    | R2  | 1     | 2       | 3         | 4     |           |          |          |     |     |
| LD            | F2                     | 45+    | R3  | 5     | 6       | 7         | 8     |           |          |          |     |     |
| MULT          | 1 <b>F</b> 0           | F2     | F4  | 6     | 9       |           |       |           |          |          |     |     |
| SUBE          | ) F8                   | F6     | F2  | 7     | 9       | 11        | 12    |           |          |          |     |     |
| DIVD          | F10                    | F0     | F6  | 8     |         |           |       |           |          |          |     |     |
| ADDE          | ) F6                   | F8     | F2  | 13    | 14      |           |       |           |          |          |     |     |
| <u>Funct</u>  | Functional unit status |        |     |       |         | dest      | S1    | S2        | FU for j | FU for k | Fj? | Fk? |
|               | Time Name              |        | e   | Busy  | Ор      | Fi        | Fj    | Fk        | Qj       | Qk       | Rj  | Rk  |
|               |                        | Integ  | er  | No    |         |           |       |           |          |          |     |     |
|               | 4                      | Mult1  | 1   | Yes   | Mult    | F0        | F2    | F4        |          |          | Yes | Yes |
|               |                        | Mult2  | 2   | No    |         |           |       |           |          |          |     |     |
|               | 1                      | Add    |     | Yes   | Add     | F6        | F8    | F2        |          |          | Yes | Yes |
|               | Divide                 |        | Yes | Div   | F10     | F0        | F6    | Mult1     |          | No       | Yes |     |
| <u>Regis</u>  | Register result status |        |     |       |         |           |       |           |          |          |     |     |
| Cloc          | ck                     |        |     | FO    | F2      | F4        | F6    | F8        | F10      | F12      |     | F30 |
| 15            |                        |        | FU  | Mult1 |         |           | Add   |           | Divide   |          |     |     |

| Instruction : | status                        | _  |       | Read   | Execut   | ic Write |            |          |          |     |     |
|---------------|-------------------------------|----|-------|--------|----------|----------|------------|----------|----------|-----|-----|
| Instruction   | j                             | k  | Issue | operan | a comple | t Resu   | <u>l</u> t |          |          |     |     |
| LD F6         | 34+                           | R2 | 1     | 2      | 3        | 4        |            |          |          |     |     |
| LD F2         | 45+                           | R3 | 5     | 6      | 7        | 8        |            |          |          |     |     |
| MULT F0       | F2                            | F4 | 6     | 9      |          |          |            |          |          |     |     |
| SUBD F8       | F6                            | F2 | 7     | 9      | 11       | 12       |            |          |          |     |     |
| DIVD F10      | F0                            | F6 | 8     |        |          |          |            |          |          |     |     |
| ADDD F6       | F8                            | F2 | 13    | 14     | 16       |          |            |          |          |     |     |
| Functional u  | Functional unit status        |    |       |        | dest     | S1       | S2         | FU for j | FU for k | Fj? | Fk? |
| Time          | Time Name                     |    | Busy  | Ор     | Fi       | Fj       | Fk         | Qj       | Qk       | Rj  | Rk  |
|               | <i>l ime Nam</i> e<br>Integer |    | No    |        |          |          |            |          |          |     |     |
| 3             | Mult'                         | 1  | Yes   | Mult   | F0       | F2       | F4         |          |          | Yes | Yes |
|               | Mult2                         | 2  | No    |        |          |          |            |          |          |     |     |
| 0             | Add                           |    | Yes   | Add    | F6       | F8       | F2         |          |          | Yes | Yes |
|               | Divid                         | le | Yes   | Div    | F10      | F0       | F6         | Mult1    |          | No  | Yes |
| Register res  | Register result status        |    |       |        |          |          |            |          |          |     |     |
| Clock         |                               |    | F0    | F2     | F4       | F6       | F8         | F10      | F12      |     | F30 |
| 16 FU         |                               | FU | Mult1 |        |          | Add      |            | Divide   |          |     |     |

| <u>Instructi</u> | on s                   | <u>tatus</u> |       |       | Read    | Execution | Write |        |          |          |     |     |
|------------------|------------------------|--------------|-------|-------|---------|-----------|-------|--------|----------|----------|-----|-----|
| Instructi        | on                     | j            | k     | Issue | operand | complet   | Resu  | lt     |          |          |     |     |
| LD F             | 6                      | 34+          | R2    | 1     | 2       | 3         | 4     |        |          |          |     |     |
| LD F             | 2                      | 45+          | R3    | 5     | 6       | 7         | 8     |        |          |          |     |     |
| MULT  F          | 0                      | F2           | F4    | 6     | 9       |           |       |        |          |          |     |     |
| SUBD F           | 8                      | F6           | F2    | 7     | 9       | 11        | 12    |        |          |          |     |     |
| DIVD F           | 10                     | F0           | F6    | 8     |         |           |       |        |          |          |     |     |
| ADDD F           | 6                      | F8           | F2    | 13    | 14      | 16        |       |        |          |          |     |     |
| <b>Function</b>  | Functional unit status |              |       |       |         | dest      | S1    | S2     | FU for j | FU for k | Fj? | Fk? |
| Time Name        |                        | е            | Busy  | Ор    | Fi      | Fj        | Fk    | Qj     | Qk       | Rj       | Rk  |     |
|                  |                        | Integ        | er    | No    |         |           |       |        |          |          |     |     |
|                  | 2                      | Mult1        |       | Yes   | Mult    | F0        | F2    | F4     |          |          | Yes | Yes |
|                  |                        | Mult2        | 2     | No    |         |           |       |        |          |          |     |     |
|                  |                        | Add          |       | Yes   | Add     | F6        | F8    | F2     |          |          | Yes | Yes |
|                  |                        | Divid        | е     | Yes   | Div     | F10       | F0    | F6     | Mult1    |          | No  | Yes |
| Register         | Register result status |              |       |       |         |           |       |        |          |          |     |     |
| Clock            |                        |              |       | F0    | F2      | F4        | F6    | F8     | F10      | F12      |     | F30 |
| 17               | ı <del></del>          |              | Mult1 |       |         | Add       |       | Divide |          |          |     |     |

| <u>Instructio</u> | n statu:                         | <u>s</u> |       | Read       | Execut   | tic Write | )           |          |          |     |     |
|-------------------|----------------------------------|----------|-------|------------|----------|-----------|-------------|----------|----------|-----|-----|
| Instruction       | n <i>j</i>                       | k        | Issue | operar     | a comple | et Resu   | <u>ı</u> lt |          |          |     |     |
| LD F6             | 34+                              | - R2     | 1     | 2          | 3        | 4         |             |          |          |     |     |
| LD F2             | 45+                              | - R3     | 5     | 6          | 7        | 8         |             |          |          |     |     |
| MULTIFO           | F2                               | F4       | 6     | 9          |          |           |             |          |          |     |     |
| SUBD F8           | F6                               | F2       | 7     | 9          | 11       | 12        |             |          |          |     |     |
| DIVD F1           | 0 F0                             | F6       | 8     |            |          |           |             |          |          |     |     |
| ADDD F6           | F8                               | F2       | 13    | 14         | 16       |           |             |          |          |     |     |
| <b>Function</b>   | al unit s                        | tatus    |       |            | dest     | S1        | S2          | FU for j | FU for k | Fj? | Fk? |
| Ti                | Functional unit status Time Name |          | Busy  | Ор         | Fi       | Fj        | Fk          | Qj       | Qk       | Rj  | Rk  |
|                   | Inte                             | eger     | No    |            |          |           |             |          |          |     |     |
|                   | 1 Mul                            | t1       | Yes   | Mult       | F0       | F2        | F4          |          |          | Yes | Yes |
|                   | Mul                              | t2       | No    |            |          |           |             |          |          |     |     |
|                   | Add                              | t        | Yes   | Add        | F6       | F8        | F2          |          |          | Yes | Yes |
|                   | Divi                             | ide      | Yes   | Div        | F10      | F0        | F6          | Mult1    |          | No  | Yes |
| Register          | Divide Register result status    |          |       |            |          |           |             |          |          |     |     |
| Clock             |                                  |          | F0    | <i>F</i> 2 | F4       | F6        | F8          | F10      | F12      |     | F30 |
| 18                |                                  |          | Mult1 |            | -        | Add       | •           | Divide   | _        | -   |     |

| <u>Instru</u> | ction s  | status   |             |              | Read       | Executi   | (Write |           |          |          |     |     |
|---------------|----------|----------|-------------|--------------|------------|-----------|--------|-----------|----------|----------|-----|-----|
| Instru        | ction    | j        | k           | <u>Issue</u> | operan     | a comple: | t Resu | <u>Įt</u> |          |          |     |     |
| LD            | F6       | 34+      | R2          | 1            | 2          | 3         | 4      |           |          |          |     |     |
| LD            | F2       | 45+      | R3          | 5            | 6          | 7         | 8      |           |          |          |     |     |
| MULT          | ΠFO      | F2       | F4          | 6            | 9          | 19        |        |           |          |          |     |     |
| SUBE          | ) F8     | F6       | F2          | 7            | 9          | 11        | 12     |           |          |          |     |     |
| DIVD          | F10      | F0       | F6          | 8            |            |           |        |           |          |          |     |     |
| ADD           | DF6      | F8       | F2          | 13           | 14         | 16        |        |           |          |          |     |     |
| <u>Funct</u>  | ional ι  | unit sta | atus        |              |            | dest      | S1     | S2        | FU for j | FU for k | Fj? | Fk? |
|               | Time     | Nam      | е           | Busy         | Ор         | Fi        | Fj     | Fk        | Qj       | Qk       | Rj  | Rk  |
|               |          | Integ    | er          | No           |            |           |        |           |          |          |     |     |
|               | 0        | Mult1    |             | Yes          | Mult       | F0        | F2     | F4        |          |          | Yes | Yes |
|               |          | Mult2    | 2           | No           |            |           |        |           |          |          |     |     |
|               |          | Add      |             | Yes          | Add        | F6        | F8     | F2        |          |          | Yes | Yes |
|               |          | Divid    | е           | Yes          | Div        | F10       | F0     | F6        | Mult1    |          | No  | Yes |
| Regis         | ster res | ult sta  | <u>ıtus</u> |              |            |           |        |           |          |          |     |     |
| Clo           | ck       |          |             | F0           | <i>F</i> 2 | F4        | F6     | F8        | F10      | F12      |     | F30 |
| 19            |          |          | FU          | Mult1        |            |           | Add    |           | Divide   |          |     |     |

| Instruction st | tatus_  |            |       | Read    | Execution | Write |    |          |          |     |     |
|----------------|---------|------------|-------|---------|-----------|-------|----|----------|----------|-----|-----|
| Instruction    | j       | k          | Issue | operand | complet   | Resu  | lt |          |          |     |     |
| LD F6          | 34+     | R2         | 1     | 2       | 3         | 4     |    |          |          |     |     |
| LD F2          | 45+     | R3         | 5     | 6       | 7         | 8     |    |          |          |     |     |
| MULT F0        | F2      | F4         | 6     | 9       | 19        | 20    |    |          |          |     |     |
| SUBD F8        | F6      | F2         | 7     | 9       | 11        | 12    |    |          |          |     |     |
| DIVD F10       | F0      | F6         | 8     |         |           |       |    |          |          |     |     |
| ADDD F6        | F8      | F2         | 13    | 14      | 16        |       |    |          |          |     |     |
| Functional u   | nit sta | atus       | •     |         | dest      | S1    | S2 | FU for j | FU for k | Fj? | Fk? |
| Time           | Name    | Э          | Busy  | Ор      | Fi        | Fj    | Fk | Qj       | Qk       | Rj  | Rk  |
|                | Integ   | er         | No    |         |           |       |    |          |          |     |     |
|                | Mult1   |            | No    |         |           |       |    |          |          |     |     |
|                | Mult2   | <u>)</u>   | No    |         |           |       |    |          |          |     |     |
|                | Add     |            | Yes   | Add     | F6        | F8    | F2 |          |          | Yes | Yes |
|                | Divid   | е          | Yes   | Div     | F10       | F0    | F6 |          |          | Yes | Yes |
| Register resu  | ult sta | <u>tus</u> |       |         |           |       |    |          |          |     |     |
| Clock          |         |            | F0    | F2      | F4        | F6    | F8 | F10      | F12      |     | F30 |
| 20             |         | FU         |       |         |           | Add   |    | Divide   |          |     |     |

| <u>Instruction</u> : | <u>status</u> | _    |       | Read   | Execut   | ic Write | •          |          |          |     |     |
|----------------------|---------------|------|-------|--------|----------|----------|------------|----------|----------|-----|-----|
| Instruction          | j             | k    | Issue | operan | a comple | t Resu   | <u>l</u> t |          |          |     |     |
| LD F6                | 34+           | R2   | 1     | 2      | 3        | 4        |            |          |          |     |     |
| LD F2                | 45+           | R3   | 5     | 6      | 7        | 8        |            |          |          |     |     |
| MULT F0              | F2            | F4   | 6     | 9      | 19       | 20       |            |          |          |     |     |
| SUBD F8              | F6            | F2   | 7     | 9      | 11       | 12       |            |          |          |     |     |
| DIVD F10             | F0            | F6   | 8     | 21     |          |          |            |          |          |     |     |
| ADDD F6              | F8            | F2   | 13    | 14     | 16       |          |            |          |          |     |     |
| <b>Functional</b>    | unit sta      | atus |       |        | dest     | S1       | S2         | FU for j | FU for k | Fj? | Fk? |
| Time                 | Nam           | e    | Busy  | Ор     | Fi       | Fj       | Fk         | Qj       | Qk       | Rj  | Rk  |
|                      | Integ         | jer  | No    | -      |          |          |            | -        |          |     |     |
|                      | Mult          | 1    | No    |        |          |          |            |          |          |     |     |
|                      | Mult2         | 2    | No    |        |          |          |            |          |          |     |     |
|                      | Add           |      | Yes   | Add    | F6       | F8       | F2         |          |          | Yes | Yes |
|                      | Divid         | le   | Yes   | Div    | F10      | F0       | F6         |          |          | Yes | Yes |
| Register res         | sult sta      | atus |       |        |          |          |            |          |          |     |     |
| Clock                |               |      | F0    | F2     | F4       | F6       | F8         | F10      | F12      |     | F30 |
| 21                   |               | FU   |       |        |          | Add      |            | Divide   |          |     |     |

| Instruction s       | status   | -           |       | Read    | Execution | Write |    |          |          |     |     |
|---------------------|----------|-------------|-------|---------|-----------|-------|----|----------|----------|-----|-----|
| Instruction         | j        | k           | Issue | operand | a complet | Resu  | lt |          |          |     |     |
| LD F6               | 34+      | R2          | 1     | 2       | 3         | 4     |    |          |          |     |     |
| LD F2               | 45+      | R3          | 5     | 6       | 7         | 8     |    |          |          |     |     |
| MULT F0             | F2       | F4          | 6     | 9       | 19        | 20    |    |          |          |     |     |
| SUBD F8             | F6       | F2          | 7     | 9       | 11        | 12    |    |          |          |     |     |
| DIVD F10            | F0       | F6          | 8     | 21      |           |       |    |          |          |     |     |
| ADDD F6             | F8       | F2          | 13    | 14      | 16        | 22    |    |          |          |     |     |
| <u>Functional ι</u> | unit sta | <u>atus</u> |       |         | dest      | S1    | S2 | FU for j | FU for k | Fj? | Fk? |
| Time                | Nam      | е           | Busy  | Ор      | Fi        | Fj    | Fk | Qj       | Qk       | Rj  | Rk  |
|                     | Integ    | jer         | No    |         |           |       |    |          |          |     |     |
|                     | Mult1    | 1           | No    |         |           |       |    |          |          |     |     |
|                     | Mult2    | 2           | No    |         |           |       |    |          |          |     |     |
|                     | Add      |             | No    |         |           |       |    |          |          |     |     |
| 40                  | Divid    | е           | Yes   | Div     | F10       | F0    | F6 |          |          | Yes | Yes |
| Register res        | ult sta  | atus        |       |         |           |       |    |          |          |     |     |
| Clock               |          |             | F0    | F2      | F4        | F6    | F8 | F10      | F12      |     | F30 |
| 22                  |          | FU          |       |         |           |       |    | Divide   |          |     |     |

| Instruction s       | status   | _           |       | Read    | Executi | (Write |    |          |          |     |     |
|---------------------|----------|-------------|-------|---------|---------|--------|----|----------|----------|-----|-----|
| Instruction         | j        | k           | Issue | operand | complet | Resu   | lt |          |          |     |     |
| LD F6               | 34+      | R2          | 1     | 2       | 3       | 4      |    |          |          |     |     |
| LD F2               | 45+      | R3          | 5     | 6       | 7       | 8      |    |          |          |     |     |
| MULT F0             | F2       | F4          | 6     | 9       | 19      | 20     |    |          |          |     |     |
| SUBD F8             | F6       | F2          | 7     | 9       | 11      | 12     |    |          |          |     |     |
| DIVD F10            | F0       | F6          | 8     | 21      | 61      |        |    |          |          |     |     |
| ADDD F6             | F8       | F2          | 13    | 14      | 16      | 22     |    |          |          |     |     |
| <u>Functional ι</u> | unit sta | <u>atus</u> |       |         | dest    | S1     | S2 | FU for j | FU for k | Fj? | Fk? |
| Time                | Nam      | е           | Busy  | Ор      | Fi      | Fj     | Fk | Qj       | Qk       | Rj  | Rk  |
|                     | Integ    | er          | No    |         |         |        |    |          |          |     |     |
|                     | Mult1    | 1           | No    |         |         |        |    |          |          |     |     |
|                     | Mult2    | 2           | No    |         |         |        |    |          |          |     |     |
|                     | Add      |             | No    |         |         |        |    |          |          |     |     |
| 0                   | Divid    | е           | Yes   | Div     | F10     | F0     | F6 |          |          | Yes | Yes |
| Register res        | sult sta | atus        |       |         |         |        |    |          |          |     |     |
| Clock               |          |             | F0    | F2      | F4      | F6     | F8 | F10      | F12      |     | F30 |
| 61                  |          | FU          |       |         |         |        |    | Divide   |          |     |     |

| Instruction s | status   | _    |       | Read   | Executi   | (Write |    |          |          |     |     |
|---------------|----------|------|-------|--------|-----------|--------|----|----------|----------|-----|-----|
| Instruction   | j        | k    | Issue | operan | a complet | Resu   | lt |          |          |     |     |
| LD F6         | 34+      | R2   | 1     | 2      | 3         | 4      |    |          |          |     |     |
| LD F2         | 45+      | R3   | 5     | 6      | 7         | 8      |    |          |          |     |     |
| MULT F0       | F2       | F4   | 6     | 9      | 19        | 20     |    |          |          |     |     |
| SUBD F8       | F6       | F2   | 7     | 9      | 11        | 12     |    |          |          |     |     |
| DIVD F10      | F0       | F6   | 8     | 21     | 61        | 62     |    |          |          |     |     |
| ADDD F6       | F8       | F2   | 13    | 14     | 16        | 22     |    |          |          |     |     |
| Functional u  | unit sta | atus |       |        | dest      | S1     | S2 | FU for j | FU for k | Fj? | Fk? |
| Time          | Nam      | e    | Busy  | Ор     | Fi        | Fj     | Fk | Qj       | Qk       | Rj  | Rk  |
|               | Integ    | jer  | No    |        |           |        |    |          |          |     |     |
|               | Mult'    | 1    | No    |        |           |        |    |          |          |     |     |
|               | Mult2    | 2    | No    |        |           |        |    |          |          |     |     |
|               | Add      |      | No    |        |           |        |    |          |          |     |     |
| 0             | Divid    | le   | No    |        |           |        |    |          |          |     |     |
| Register res  | sult sta | atus |       |        |           |        |    |          |          |     |     |
| Clock         |          |      | F0    | F2     | F4        | F6     | F8 | F10      | F12      |     | F30 |
| 62            |          | FU   |       |        | -         |        |    |          |          |     |     |

## Review: Scoreboard Example Cycle 62

| Instruction s | status   | _    |                                                | R | ead   | Execut   | ic Write |           | In-or    | der issu | ۵۰             |     |
|---------------|----------|------|------------------------------------------------|---|-------|----------|----------|-----------|----------|----------|----------------|-----|
| Instruction   | j        | K    | Issue                                          | O | peran | a comple | t Resu   | <u>Įt</u> |          | _        | C <sub>1</sub> |     |
| LD F6         | 34+      | R2   | 1                                              |   | 2     | 3        | 4        |           | out-c    | of-order |                |     |
| LD F2         | 45+      | R3   | 5                                              |   | 6     | 7        | 8        |           | exec     | ute & co | mmit           |     |
| MULT F0       | F2       | F4   | 6                                              |   | 9     | 19       | 20       |           |          |          |                |     |
| SUBD F8       | F6       | F2   | 7                                              |   | 9     | 11       | 12       |           |          |          |                |     |
| DIVD F10      | F0       | F6   | 8                                              |   | 21    | 61       | 62       |           |          |          |                |     |
| ADDD F6       | F8       | F2   | 13                                             |   | 14    | 16       | 22       |           |          |          |                |     |
| Functional u  | unit sta | atus | <u>,                                      </u> | • |       | dest     | S1       | S2        | FU for j | FU for k | Fj?            | Fk? |
| Time          | . Nam    | e    | Busy                                           | 0 | p     | Fi       | Fj       | Fk        | Qj       | Qk       | Rj             | Rk  |
|               | Integ    | jer  | No                                             |   |       |          |          |           |          |          |                |     |
|               | Mult'    | 1    | No                                             |   |       |          |          |           |          |          |                |     |
|               | Mult2    | 2    | No                                             |   |       |          |          |           |          |          |                |     |
|               | Add      |      | No                                             |   |       |          |          |           |          |          |                |     |
| 0             | Divid    | le   | No                                             |   |       |          |          |           |          |          |                |     |
| Register res  | sult sta | atus | 1                                              |   |       |          |          |           |          |          |                |     |
| Clock         |          |      | F0                                             | F | 2     | F4       | F6       | F8        | F10      | F12      |                | F30 |
| 62            |          | FU   |                                                |   |       |          |          | . •       |          | <i></i>  |                |     |
| 62            |          | FU   |                                                |   |       |          |          |           |          |          |                |     |

### CDC 6600 Scoreboard

- Limitations of 6600 scoreboard:
  - No forwarding hardware
  - Limited to instructions in basic block (small window)
  - Small number of functional units (structural hazards), especially integer/load store units
  - Do not issue on structural hazards
  - Wait for WAR hazards
  - Prevent WAW hazards

# ANOTHER CASE STUDY EXAMPLE

1 May 2020

### **ILP Continues....**

- Data Hazards
  - LOAD R1, [R2 + 10] // Loads into R1
  - □ ADD R3, R1, R2 //R3 = R1 + R2
- This is the "Read After Write (RAW)" Data Hazard for R1
  - □ LD R1, [R2+10]
  - ADD R3, R1, R12
  - □ LD R1, [R2 + 14]
  - ADD R12, R1, R2
- □ This shows the WAW for R1 and WAR for R12

## ILP - Pipelining Advanced



## Difficulties in Superscalar Construction

- Ensuring no Data Hazards among several instructions executing in the different execution units at a same point of time.
- If this is done by compiler then Static Instruction
   Scheduling VLIW Itanium
- Done by the hardware then Dynamic Instruction
   Scehduling Tomasulo MIPS Embedded Processor

## Static Instruction Scheduling

- Compiler make bundles of "K" instructions that can be put at the same time to the execution units such that there are no data dependencies between them.
  - Very Long Instruction Word (VLIW) to accommodate "K' instructions at a time
- Lot of "NOPS" if the bundle cannot be filled with relevant instructions
  - Size of the executable
- Does not complicate the Hardware
- Source code portability if I make the next gen processor with K+5 units (say) then?
  - Solved by having a software/firmware emulator which has a negative say in the performance.

## Dynamic Instruction Scheduling

- The data hazards are handled by the hardware
  - RAW using Operand Forwarding Technique
  - WAR and WAW using Register Renaming Technique

## **Processor Overview**

63

Why should result of LD go to R2 in Reg file and then reload to ALU?

Forward the same on its way to reg file



## Register Renaming

- 1. ADD R1, R2, R3
- 2. ST R1, [R4+50]
- 3. ADD R1, R5, R6
- 4. SUB R7,R1,R8
- 5. ST R1, [R4 + 54]
- 6. ADD R1, R9, R10

#### Dependencies due to Reg R1

## Register Renaming: Static Scheduling

- 1. ADD R1, R2, R3
- 2. ST R1, [R4+50]
- 3. ADD R12, R5, R6
- 4. SUB R7,R12,R8
- 5. ST R12, [R4 + 54]
- 6. ADD R1, R9,R10

Rename R1 to R12 after Instruction 3 till Instruction 6

Dependency only within a window and not the whole program.

Only WAR and WAW are between (1,6) and (2,6) which are far away in the program order

Increases Register pressure for the compiler



Instructions are fetched one by one and decoded to find the type of operation and the source of operands



Register Status Indicator indicates whether the latest value of the register is in the reg file or currently being computed by some execution unit and if the latter it states the execution unit number



If all operands available then operation proceeds in the allotted execution unit, else, it waits in the reservation station of the allotted execution unit pinging the CDB



Every Execution unit writes the result along with the unit number on to the CDB which is forwarded to all reservation stations, Reg-file and Memory

## An Example:

70

Instruction Fetch



- 2. ST R1, [R4+50]
- 3. ADD R1, R5, R6
- SUB R7,R1,R8
- ST R1, [R4 + 54]5.
- ADD R1, R9, R10 6.

#### Register Status Indicator

| Reg<br>Number | R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | R9 | R10 |
|---------------|----|----|----|----|----|----|----|----|----|-----|
| Status        | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0   |

| Empty Empty Empty Empty Empty | Former Former Former Former |
|-------------------------------|-----------------------------|
|-------------------------------|-----------------------------|

71

Instruction Fetch

ADD R1, R2, R3

1. --

- 2. ST R1, [R4+50]
- 3. ADD R1, R5, R6
- 4. SUB R7,R1,R8
- 5. ST R1, [R4 + 54]
- 6. ADD R1, R9, R10

Register Status Indicator

| Reg<br>Number | R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | <b>R</b> 9 | R10 |
|---------------|----|----|----|----|----|----|----|----|------------|-----|
| Status        | 1  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0          | 0   |

| Ins 1 | Empty | Empty | Empty | Empty | Empty |
|-------|-------|-------|-------|-------|-------|
|       |       |       |       |       | •     |

72

Instruction Fetch

ST R1, [R4+50]

- 1. ---
- 2. ---
- 3. ADD R1, R5, R6
- 4. SUB R7,R1,R8
- 5. ST R1, [R4 + 54]
- 6. ADD R1, R9, R10

Register Status Indicator

| Reg<br>Number | R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | R9 | R10 |
|---------------|----|----|----|----|----|----|----|----|----|-----|
| Status        | 1  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0   |

| II, E   IZ, W I   EMPTY   EMPTY   EMPTY   EMPTY | I 1, E | 12, W 1 | Empty | Empty | Empty | Empty |
|-------------------------------------------------|--------|---------|-------|-------|-------|-------|
|-------------------------------------------------|--------|---------|-------|-------|-------|-------|



73

Instruction Fetch

ADD R1, R5, R6

- 1. ---
- 2. ---
- 3. ---
- 4. SUB R7,R1,R8
- 5. ST R1, [R4 + 54]
- 6. ADD R1, R9, R10

Register Status Indicator

| Reg<br>Number | R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | <b>R</b> 9 | R10 |
|---------------|----|----|----|----|----|----|----|----|------------|-----|
| Status        | 3  | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0          | 0   |

| I 1, E | 12, W 1 | 1 3, E | Empty | Empty | Empty |
|--------|---------|--------|-------|-------|-------|
|--------|---------|--------|-------|-------|-------|

Note: Reservation Station stores the number of the execution unit that shall yield the latest value of a register.



74

Instruction Fetch

SUB R7,R1,R8



- 2. ---
- 3. ---
- 4. ---
- 5. ST R1, [R4 + 54]
- 6. ADD R1, R9, R10

Register Status Indicator

| Reg<br>Number | R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | R9 | R10 |
|---------------|----|----|----|----|----|----|----|----|----|-----|
| Status        | 3  | 0  | 0  | 0  | 0  | 0  | 4  | 0  | 0  | 0   |

## An Example:

75

Instruction Fetch

ST R1, [R4 + 54]

- 1. ---
- 2. ----
- 3. ---
- 4. ---
- 5. ---
- 6. ADD R1, R9, R10

Register Status Indicator

| Reg<br>Number | R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | R9 | R10 |
|---------------|----|----|----|----|----|----|----|----|----|-----|
| Status        | 3  | 0  | 0  | 0  | 0  | 0  | 4  | 0  | 0  | 0   |



76

Instruction Fetch

ADD R1, R9, R10



Register Status Indicator

| Reg<br>Number | R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | R9 | R10 |
|---------------|----|----|----|----|----|----|----|----|----|-----|
| Status        | 6  | 0  | 0  | 0  | 0  | 0  | 4  | 0  | 0  | 0   |



Instruction Fetch

ADD **R1**, R9, R10

- I. ADD R1, R2, R3
- 2. ST U1, [R4+50]
- 3. ADD R1, R5, R6
- 4. SUB R7, U3, R8
- 5. ST U3, [R4 + 54]
- 6. ADD R1, R9, R10

#### Register Status Indicator

| Reg<br>Number | R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | <b>R</b> 9 | R10 |
|---------------|----|----|----|----|----|----|----|----|------------|-----|
| Status        | 6  | 0  | 0  | 0  | 0  | 0  | 4  | 0  | 0          | 0   |

| I 1, E | 12, W 1 | 1 3, E | 14, W 3 | 15, W 3 | I 6, E |
|--------|---------|--------|---------|---------|--------|
|--------|---------|--------|---------|---------|--------|

Effectively three Instructions are executing and others waiting for the appropriate results. The whole program is converted as shown above.



Instruction Fetch

ADD R1, R9, R10

- I. ADD R1, R2, R3
- 2. ST U1, [R4+50]
- 3. ADD R1, R5, R6
- 4. SUB R7, U3, R8
- 5. ST U3, [R4 + 54]
- 6. ADD R1, R9, R10

Register Status Indicator

| Reg<br>Number | R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | <b>R</b> 9 | R10 |
|---------------|----|----|----|----|----|----|----|----|------------|-----|
| Status        | 6  | 0  | 0  | 0  | 0  | 0  | 4  | 0  | 0          | 0   |

See that Operand Forwarding and Register Renaming is done automatically



Instruction Fetch

ADD R1, R9, R10

- 1. ADD R1, R2, R3
- 2. ST U1, [R4+50]
- 3. ADD R1, R5, R6
- 4. SUB R7, U3, R8
- 5. ST U3, [R4 + 54]
- 6. ADD R1, R9, R10

Register Status Indicator

| Reg<br>Number | R1 | R2 | R3 | R4 | R5 | R6 | R7 | R8 | <b>R</b> 9 | R10 |
|---------------|----|----|----|----|----|----|----|----|------------|-----|
| Status        | 6  | 0  | 0  | 0  | 0  | 0  | 4  | 0  | 0          | 0   |

| 11,E   12,W1   13,E   14,W3   15,W3   16,E |
|--------------------------------------------|
|--------------------------------------------|

Execution unit 6, on completion will make R1 entry in Register Status Indicator 0. Similarly unit 4 will make R7 entry 0.

### Dynamic Scheduling

- Rearrange order of instructions to reduce stalls while maintaining data flow
- Advantages:
  - Compiler doesn't need to have knowledge of microarchitecture
  - Handles cases where dependencies are unknown at compile time
- Disadvantage:
  - Substantial increase in hardware complexity
  - Complicates exceptions

### Dynamic Scheduling

- Dynamic scheduling implies:
  - Out-of-order execution
  - Out-of-order completion
- Creates the possibility for WAR and WAW hazards
- Tomasulo's Approach
  - Tracks when operands are available
  - Introduces register renaming in hardware
    - Minimizes WAW and WAR hazards

#### Register Renaming

#### Example:

DIV.D F0,F2,F4

ADD.D F6,F0,F8

S.D F6,O(R1)

antidependence

SUB.D F8,F10,F14

antidependence

MUL.D F6,F10,F8

+ name dependence with F6

#### Register Renaming

Example:

```
DIV.D F0,F2,F4
ADD.D S,F0,F8
S.D S,O(R1)
SUB.D T,F10,F14
MUL.D F6,F10,T
```

 Now only RAW hazards remain, which can be strictly ordered

#### Register Renaming

- Register renaming is provided by reservation stations (RS)
  - Contains:
    - The instruction
    - Buffered operand values (when available)
    - Reservation station number of instruction providing the operand values
  - RS fetches and buffers an operand as soon as it becomes available (not necessarily involving register file)
  - Pending instructions designate the RS to which they will send their output
    - Result values broadcast on a result bus, called the common data bus (CDB)
  - Only the last output updates the register file
  - As instructions are issued, the register specifiers are renamed with the reservation station
  - May be more reservation stations than registers

### Tomasulo's Algorithm

- Load and store buffers
  - Contain data and addresses, act like reservation stations

Top-level design:



From instruction unit

#### Tomasulo's Algorithm

#### Three Steps:

- Issue
  - Get next instruction from FIFO queue
  - If available RS, issue the instruction to the RS with operand values if available
  - If operand values not available, stall the instruction
- Execute
  - When operand becomes available, store it in any reservation stations waiting for it
  - When all operands are ready, issue the instruction
  - Loads and store maintained in program order through effective address
  - No instruction allowed to initiate execution until all branches that proceed it in program order have completed
- Write result
  - Write result on CDB into reservation stations and store buffers
    - (Stores must wait until address and value are received)

# Example

|          |           | Instruction status |         |              |  |  |  |  |  |  |
|----------|-----------|--------------------|---------|--------------|--|--|--|--|--|--|
| Instruct | tion      | Issue              | Execute | Write Result |  |  |  |  |  |  |
| L.D      | F6,32(R2) | √                  | √       | √            |  |  |  |  |  |  |
| L.D      | F2,44(R3) | V                  | √       |              |  |  |  |  |  |  |
| MUL.D    | F0,F2,F4  | V                  |         |              |  |  |  |  |  |  |
| SUB.D    | F8,F2,F6  | $\sqrt{}$          |         |              |  |  |  |  |  |  |
| DIV.D    | F10,F0,F6 | V                  |         |              |  |  |  |  |  |  |
| ADD.D    | F6,F8,F2  | $\checkmark$       |         |              |  |  |  |  |  |  |

|       |      | Reservation stations |    |                    |       |       |               |  |  |  |  |  |  |
|-------|------|----------------------|----|--------------------|-------|-------|---------------|--|--|--|--|--|--|
| Name  | Busy | Ор                   | Vj | Vk                 | Qj    | Qk    | Α             |  |  |  |  |  |  |
| Load1 | No   |                      |    |                    |       |       |               |  |  |  |  |  |  |
| Load2 | Yes  | Load                 |    |                    |       |       | 44 + Regs[R3] |  |  |  |  |  |  |
| Add1  | Yes  | SUB                  |    | Mem[32 + Regs[R2]] | Load2 |       |               |  |  |  |  |  |  |
| Add2  | Yes  | ADD                  |    |                    | Add1  | Load2 |               |  |  |  |  |  |  |
| Add3  | No   |                      |    |                    |       |       |               |  |  |  |  |  |  |
| Mult1 | Yes  | MUL                  |    | Regs[F4]           | Load2 |       |               |  |  |  |  |  |  |
| Mult2 | Yes  | DIV                  |    | Mem[32 + Regs[R2]] | Mult1 |       |               |  |  |  |  |  |  |

|       | Register status |       |    |      |      |       |     |  |     |
|-------|-----------------|-------|----|------|------|-------|-----|--|-----|
| Field | FO              | F2    | F4 | F6   | F8   | F10   | F12 |  | F30 |
| Qi    | Mult1           | Load2 |    | Add2 | Add1 | Mult2 |     |  |     |

#### THANK YOU!!

1 May 2020