#### Instruction Level Parallelism

Explicit register renaming

## Advanced Computer Architectures Explicit Register Renaming

- Tomasulo provides Implicit Register Renaming
  - User registers renamed to reservation station tags
- Now we introduce Explicit Register Renaming:
  - Use *physical* register file that is larger than number of registers specified by the ISA
- Key insight: Allocate a new physical destination register for every instruction that writes
  - Very similar to a compiler transformation called Static Single Assignment (SSA) form — but in hardware!
  - Removes all chance of WAR or WAW hazards
  - Like Tomasulo, good for allowing full out-of-order completion
  - Like hardware-based dynamic compilation?

42

Explicit register renaming (MIPS R10000 Style)

P<sub>0</sub>

**P1** 

**P2** 

**P3** 

32bit



N.B. In order to map a FP register to a physical one, we need to use two physical ones (see size)

floating point register file FP

register file

| F0          | 64bit | P30 |
|-------------|-------|-----|
| F2          |       |     |
| •••         |       |     |
|             |       |     |
| <b>F</b> 30 |       |     |

- Physical Register File larger than ISA Register File
- On issue, each instruction that writes a result is allocated new physical register from **Freelist**
- When a physical register P0 is "dead" (or not "live"), we free up



| P32 P34 | P36 | P38 |  |
|---------|-----|-----|--|
|---------|-----|-----|--|

P61

P62

••• P60 P62

#### Explicit Register Renaming

- Mechanism? Keep a translation table:
  - ISA register → physical register mapping
  - When register written, replace entry with new register from freelist
  - Physical register becomes free when not used by any active instructions



#### Unified Physical Register File

- Rename all architectural registers into a single *physical* register file during decode, no register values read
- Functional units read and write from single unified register file holding committed and temporary registers in execution
- Commit only updates mapping of architectural register to physical register, no data movement



#### **Instruction Commit**

- Record that the mapping between an architectural register number and physical register number is no longer speculative
- Free up any physical registers being used to hold the "older" value of the architectural register
- Deallocating registers is more complicated:
  - Before freeing up a physical register we must know that it no longer corresponds to an architectural register and that no further uses of the physical register are outstanding
  - A physical register corresponds to an architectural register until the architectural register is **rewritten**
  - However, there may be uses of the physical register outstanding. The
    processor must check if any source operand corresponds to that register in
    the functional units queue. If it does not appear then it can be
    deallocated.
    - Alternatively the processor can simply wait until another instr. that writes the same architectural register commits. This may tie up a physical register slightly longer than necessary, but it is easy to implement

## Hw register renaming

- Renaming map: simple data structure that supplies the physical register number of the register that currently corresponds to the requested architectural register
- Instruction commit: update permanently the renaming table to indicate that the physical register holding the destination value corresponds to the actual architectural register
- Use ROB to enforce in-order commit

77

#### Pipeline Design with Physical Regfile



#### Advantages of Explicit Renaming

- Decouples renaming from scheduling:
  - Pipeline can be exactly like "standard" DLX pipeline (perhaps with multiple operations issued per cycle)
  - Or, pipeline could be tomasulo-like or a scoreboard, etc.
  - Standard forwarding or bypassing could be used
- Allows data to be fetched from single register file
  - No need to bypass values from reorder buffer
  - This can be important for balancing pipeline
- Many processors use a variant of this technique:
  - R10000, Alpha 21264, HP PA8000

#### **Explicit Renaming Support**

- Rapid access to a table of translations
- A physical register file that has more registers than specified by the ISA
- Ability to figure out which physical registers are free.
  - No free registers → stall on issue
- Thus, register renaming doesn't require reservation stations. However:
  - Many modern architectures use explicit register renaming + Tomasulo-like reservation stations to control execution.
- Two Questions:
  - How do we manage the "free list"?
  - How does Explicit Register Renaming mix with Precise Interrupts?

#### Lifetime of Physical Registers

- Physical register file holds committed and speculative values
- Physical registers decoupled from ROB entries (no data in ROB)

```
ld x1, (x3)
addi x3, x1, #4
sub x6, x7, x9
add x3, x3, x6
ld x6, (x1)
add x6, x6, x3
sd x6, (x1)
ld x6, (x11)
```

```
ld P1, (Px)
addi P2, P1, #4
sub P3, Py, Pz
add P4, P2, P3
ld P5, (P1)
add P6, P5, P4
sd P6, (P1)
ld P7, (Pw)
```

When can we reuse a physical register?

When **next** writer of same architectural register commits

#### Physical Register Management







| 1d x1, 0(x3)    |    |
|-----------------|----|
| addi x3, x1,    | #4 |
| sub x6, x7, x   | (6 |
| add $x3, x3, x$ | 6) |
| 1d x6, 0(x1)    |    |
|                 |    |

| OB   | 9  | source1 source2 |         |         |                |               | new             |
|------|----|-----------------|---------|---------|----------------|---------------|-----------------|
| х ор | p1 | PR1             | p2      | PR2     | Rd             | LPRd          | PRd             |
|      |    |                 |         |         |                |               |                 |
|      |    |                 |         |         |                |               |                 |
|      |    |                 |         |         |                |               |                 |
|      |    |                 |         |         |                |               |                 |
|      |    |                 |         |         |                |               |                 |
|      |    |                 |         |         |                |               |                 |
|      |    |                 |         |         |                |               |                 |
|      |    |                 | 3001001 | 3041061 | Source Sourcez | JOUICE JOUICE | 3041662 3041662 |

(LPRd requires third read port on Rename Table for each instruction)

#### Physical Register Management



| use | ex | ор | ր1 | PR1 | <b>p</b> 2 | PR2 | Rd | LPRd | PRd |
|-----|----|----|----|-----|------------|-----|----|------|-----|
| X   |    | ld | p  | P7  |            |     | x1 | P8   | PO  |
|     |    |    |    |     |            |     |    |      |     |
|     |    |    |    |     |            |     |    |      |     |
|     |    |    |    |     |            |     |    |      |     |
|     |    |    |    |     |            |     |    |      |     |
|     |    |    |    |     |            |     |    |      |     |
|     |    |    |    |     |            |     |    |      |     |

why do we need to keep track of P8?

Because we didn't write P0 yet, only when P0 will be written (at execution), P8 will be freed. In the meanwhile, other instructions that need to read x1 will use P8









Physical Register Management



Physical Register Management



#### Physical Register Management





ld x1, 0(x3) addi x3, x1, #4 sub x6, x7, x6 add x3, x3, x6 ld x6, 0(x1)

| R          | OF                                 | 3 |
|------------|------------------------------------|---|
| <i>1</i> \ | $\smile$ $_{\scriptscriptstyle L}$ | _ |

| use | ex | op   | р1  | PR1             | p2 | PR2 | Rd         | LPRd | PRd         |
|-----|----|------|-----|-----------------|----|-----|------------|------|-------------|
| X   | X  | ld   | р   | P7              |    |     | <b>x</b> 1 | `P8  | - <b>PO</b> |
| X   |    | addi | Ď   | <del>- P0</del> |    |     | x3         | P7   | P1          |
| Χ   |    | sub  | р   | P6              | р  | P5  | x6         | P5   | Р3          |
| Χ   |    | add  | •   | P1              |    | P3  | <b>x</b> 3 | P1   | P2          |
| Χ   |    | ld   | p 4 | P0              |    |     | x6         | Р3   | P4          |
|     |    |      |     |                 |    |     |            |      |             |
|     |    |      |     |                 |    |     |            |      |             |

## Execute & Commit

As soon as the first Id writes back, the new PR of x1, that is P0, now contains the valid value and so it gets marked as physical available in the table

That is, we are informing all the guys waiting for the data that the data is now available.

Physical Register Management



#### Explicit register renaming (MIPS R10000 Style)



- Physical register file larger than ISA register file
- On issue, each instruction that modifies a register is allocated new physical register from freelist

Explicit register renaming: (MIPS R10000 Style)



Note that physical register P0 is "dead" (or not "live") past the point of this load.

When we go to commit the load, we free up

Explicit register renaming: (MIPS R10000 Style)



Explicit register renaming: (MIPS R10000 Style)



Explicit register renaming: (MIPS R10000 Style)



What happens if erroneous speculation?

| Р3 | 2   | 236 | P4  | F6  | F8  | P34              | P12              | P14 | P16 | P18  | P20  | P22  | P24 | p26 | P28  | P30  |
|----|-----|-----|-----|-----|-----|------------------|------------------|-----|-----|------|------|------|-----|-----|------|------|
| P3 | 8 F | 240 | P44 | P48 | ••• | <mark>P60</mark> | <mark>P62</mark> |     | Che | ckpc | oint | at E | BNE | ins | truc | tion |

Explicit register renaming: (MIPS R10000 Style) F6 F8 P34 P12 P14 P16 P18 P20 P22 P24 p26 P28 P30 P32 P36 P4 Done? Current Map Table Newest P38 P40 P44 P10 DIVD P36,P34,P6 N F10P10 ADDD P34,P4,P32 У **Freelist** Oldest F0 P0 LD P32,10(R2) У Speculation fixed by restoring map table/head of freelist P32 P36 P4 P34 P12 P14 P16 P18 P20 P22 P24 p26 P28 P30 **F**6 Checkpoint at BNE instruction P60 P62 P38 P40 P44 P48 •••

## Explicit Register Renaming

- Tomasulo provides Implicit Register Renaming
  - User registers renamed to reservation station tags
- Explicit Register Renaming:
  - Use physical register file that is larger than number of registers specified by ISA
- Keep a translation table:
  - ISA register => physical register mapping
  - When register is written, replace table entry with new register from freelist.
  - Physical register becomes free when not being used by any instructions in progress.
- Pipeline can be exactly like "standard" DLX pipeline
  - IF, ID, EX, etc....
- Advantages:
  - Removes all WAR and WAW hazards
  - Like Tomasulo, good for allowing full out-of-order completion
  - Allows data to be fetched from a single register file
  - Makes speculative execution/precise interrupts easier:
    - All that needs to be "undone" for precise break point is to undo the table mappings

Question: Can we use explicit register renaming with scoreboard?



## Advanced Computer Architectures Stages of Scoreboard Control With Explicit Renaming

N.B. A difference wrt the base version of scoreboard (without register renaming) is that we are not checking anymore for WAW, this because it is no more a problem

- Issue—decode instructions & check for structural hazards & allocate new physical register for result
  - Instructions issued in program order (for hazard checking)
  - Don't issue if no free physical registers
  - Don't issue if structural hazard
- Read operands—wait until no hazards, read operands
  - All real dependencies (RAW hazards) resolved in this stage, since we wait for instructions to write back data.
- Execution—operate on operands
  - The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard
- Write result —finish execution

In base version of scoreboard we had to check for WAR at this stage, again now it is not a problem

Note: No checks for WAR or WAW hazards!

Scoreboard Example

```
Instruction status:
                               Read Exec
                        Issue Oper Comp Result
   Instruction
                34 + R2
   ID
                45 + R3
   LD
           F2
   MULTD
           F0
                F2
                   F4
   SUBD
                F6
   DIVD
                    F6
           F10
                F0
                F8
   ADDD
                    F2
           F6
```

#### Functional unit status:

| Time Name | Busy | Op | Fi | Fj | Fk | Qj | Qk | Řj | Rk |
|-----------|------|----|----|----|----|----|----|----|----|
| Int1      | No   |    |    |    |    |    |    |    |    |
| Int2      | No   |    |    |    |    |    |    |    |    |
| Mult1     | No   |    |    |    |    |    |    |    |    |
| Add       | No   |    |    |    |    |    |    |    |    |
| Divide    | No   |    |    |    |    |    |    |    |    |

SI

dest

S2

FU

Fi?

Fk?

#### Register Rename and Result

Clock F0F2F4 *F*6 F8 F10 F12 F30 FUP0 P2 P4 P6 P8 P10 P12 P30

Initialized Rename Table - registers from P32 in the free list



#### Renamed Scoreboard 1

```
Instruction status:
                                   Read Exec Write
               j k Issue Oper Comp Result
    Instruction
   ID
             F6
                  34 + R2
    \mathbf{L}\mathbf{D}
                  45 + R3
             F2
    MILTD
             \mathbf{F0}
                  F2 F4
    SUBD
                      F2
    DIVD
             F10
                  \mathbf{F0}
                       F6
    ADDD
                   F8
                       F2
             F6
```

#### Functional unit status:

|           |      |      | 0.00. | ~ = | ~ _ | - 0 |    | - <i>J</i> · |     |
|-----------|------|------|-------|-----|-----|-----|----|--------------|-----|
| Time Name | Busy | Op   | Fi    | Fj  | Fk  | Qj  | Qk | Rj           | Rk  |
| Int1      | Yes  | Load | P32   |     | R2  |     |    |              | Yes |
| Int2      | No   |      |       |     |     |     |    |              |     |
| Mult1     | No   |      |       |     |     |     |    |              |     |
| Add       | No   |      |       |     |     |     |    |              |     |
| Divide    | No   |      |       |     |     |     |    |              |     |

SI

S2 FU

#### Register Rename and Result

| Clock |    | F0 | F2 | F4 | <i>F6</i> | F8 | F10 | <i>F12</i> | • • • | F30 |
|-------|----|----|----|----|-----------|----|-----|------------|-------|-----|
| 1     | FU | P0 | P2 | P4 | P32       | P8 | P10 | P12        |       | P30 |

dest

Each instruction allocates free register

black means there is a "P" in the renaming table (that means physical available.

Red means no "p"

FU = Fi?

Fk?

#### Renamed Scoreboard 2

| Instruction | ı sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instruction | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    |      |        |
| LD          | F2    | 45+  | R3 | 2     |      |      |        |
| MULTD       | F0    | F2   | F4 |       |      |      |        |
| SUBD        | F8    | F6   | F2 |       |      |      |        |
| DIVD        | F10   | F0   | F6 |       |      |      |        |
| ADDD        | F6    | F8   | F2 |       |      |      |        |

#### Functional unit status:

|           |      |      |     | ·  | ·  |    |    | - <i>J</i> · |     |
|-----------|------|------|-----|----|----|----|----|--------------|-----|
| Time Name | Busy | Op   | Fi  | Fj | Fk | Qj | Qk | Rj           | Rk  |
| Int1      | Yes  | Load | P32 |    | R2 |    |    |              | Yes |
| Int2      | Yes  | Load | P34 |    | R3 |    |    |              | Yes |
| Mult1     | No   |      |     |    |    |    |    |              |     |
| Add       | No   |      |     |    |    |    |    |              |     |
| Divide    | No   |      |     |    |    |    |    |              |     |

#### Register Rename and Result

| Clock |    | F0 | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8 | F10 | <i>F12</i> | ••• | F30 |
|-------|----|----|------------|-----------|-----------|----|-----|------------|-----|-----|
| 2     | FU | P0 | P34        | P4        | P32       | P8 | P10 | P12        |     | P30 |

dest

#### Renamed Scoreboard 3

| Instruction | n sta | tus: |            |      | Read | Exec   | Write |
|-------------|-------|------|------------|------|------|--------|-------|
| Instructio  | j     | k    | Issue      | Oper | Comp | Result |       |
| LD          | F6    | 34+  | R2         | 1    | 2    | 3      |       |
| LD          | F2    | 45+  | <b>R</b> 3 | 2    | 3    |        |       |
| MULTD       | F0    | F2   | F4         | 3    |      |        |       |
| SUBD        | F8    | F6   | F2         |      |      |        |       |
| DIVD        | F10   | F0   | F6         |      |      |        |       |
| ADDD        | F6    | F8   | F2         |      |      |        |       |

| Func | etional | unit   | status: |
|------|---------|--------|---------|
|      | uuonu   | viiiii | siains. |

| i unii siaius. |      |       | aesi | $\mathcal{S}I$ | 32         | $\Gamma U$ | I'U | I'J' | I'K! |  |
|----------------|------|-------|------|----------------|------------|------------|-----|------|------|--|
| Time Name      | Busy | Op    | Fi   | Fj             | Fk         | Qj         | Qk  | Rj   | Rk   |  |
| Int1           | Yes  | Load  | P32  |                | R2         |            |     |      | Yes  |  |
| Int2           | Yes  | Load  | P34  |                | <b>R</b> 3 |            |     |      | Yes  |  |
| Mult1          | Yes  | Multd | P36  | P34            | P4         | Int2       |     | No   | Yes  |  |
| Add            | No   |       |      |                |            |            |     |      |      |  |
| Divide         | No   |       |      |                |            |            |     |      |      |  |

#### Register Rename and Result

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8 | F10 | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|----|-----|------------|-----|-----|
| 3     | FU | P36 | P34        | P4        | P32       | P8 | P10 | P12        |     | P30 |

Fb2

EII

#### Renamed Scoreboard 4

#### Instruction status:

Instruction

IDF6 34 + R2IDF2. 45 + R3**MULTD** F2 F4  $\mathbf{F0}$ **SUBD** F8 F2 F6 DIVD F10 FO F6 F8 **ADDD** F2 F6

# Read Exec Write Issue Oper Comp Result 1 2 3 4 2 3 4 3 4

As always, the read of a written data can happens only in the clock cycle next to the one of the WB

#### Functional unit status:

Time Name
Int1
Int2
Mult1
Add
Divide

| •    |       | dest | SI  | <i>S</i> 2 | FU   | FU   | Fj? | Fk? |
|------|-------|------|-----|------------|------|------|-----|-----|
| Busy | Op    | Fi   | Fj  | Fk         | Qj   | Qk   | Rj  | Rk  |
| No   |       |      |     |            |      |      |     |     |
| Yes  | Load  | P34  |     | R3         |      |      |     | Yes |
| Yes  | Multd | P36  | P34 | P4         | Int2 |      | No  | Yes |
| Yes  | Sub   | P38  | P32 | P34        |      | Int2 | Yes | No  |
| No   |       |      |     |            |      |      |     |     |

#### Register Rename and Result

Clock

|    |     |     |    |     |     |     |     | F30 |
|----|-----|-----|----|-----|-----|-----|-----|-----|
| FU | P36 | P34 | P4 | P32 | P38 | P10 | P12 | P30 |

#### Renamed Scoreboard 5

| Instructio  | n sta | itus: |            | Read  | Exec | Write |        |
|-------------|-------|-------|------------|-------|------|-------|--------|
| Instruction | on    | j     | k          | Issue | Oper | Comp  | Result |
| LD          | F6    | 34+   | R2         | 1     | 2    | 3     | 4      |
| LD          | F2    | 45+   | <b>R</b> 3 | 2     | 3    | 4     | 5      |
| MULTD       | F0    | F2    | F4         | 3     |      |       |        |
| SUBD        | F8    | F6    | F2         | 4     |      |       |        |

| 77 . 1            | • ,    |         |
|-------------------|--------|---------|
| <b>Functional</b> | 111111 | ctatuc. |
|                   |        | BIUIUB. |

F6

| Time | Name   |
|------|--------|
|      | Int1   |
|      | Int2   |
|      | Mult1  |
|      | Add    |
|      | Divide |

F8

| • |      |       | aesi | $\mathcal{S}I$ | 32  | FU    | FU | $\mathbf{\Gamma} J$ ? | FK! |  |
|---|------|-------|------|----------------|-----|-------|----|-----------------------|-----|--|
|   | Busy | Op    | Fi   | Fj             | Fk  | Qj    | Qk | Rj                    | Rk  |  |
|   | No   |       |      |                |     |       |    |                       |     |  |
|   | No   |       |      |                |     |       |    |                       |     |  |
|   | Yes  | Multd | P36  | P34            | P4  |       |    | Yes                   | Yes |  |
|   | Yes  | Sub   | P38  | P32            | P34 |       |    | Yes                   | Yes |  |
|   | Yes  | Divd  | P40  | P36            | P32 | Mult1 |    | No                    | Yes |  |

#### Register Rename and Result

Clock 5

**ADDD** 

|    |     |     |    |     |     |     |     | F30 |
|----|-----|-----|----|-----|-----|-----|-----|-----|
| FU | P36 | P34 | P4 | P32 | P38 | P40 | P12 | P30 |

E1-9

#### Renamed Scoreboard 6

| Instruction | n sta | tus: |                  |       | Read | Exec | Write  |
|-------------|-------|------|------------------|-------|------|------|--------|
| Instructio  | n     | j    | $\boldsymbol{k}$ | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2               | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3               | 2     | 3    | 4    | 5      |
| MULTD       | F0    | F2   | F4               | 3     | 6    |      |        |
| SUBD        | F8    | F6   | F2               | 4     | 6    |      |        |
| DIVD        | F10   | F0   | F6               | 5     |      |      |        |
| ADDD        | F6    | F8   | F2               |       |      |      |        |

| Functional unit status. | dest | <i>S1</i> | <i>S</i> 2 | FU | FU | Fj? | Fk? |    |    |
|-------------------------|------|-----------|------------|----|----|-----|-----|----|----|
| Time Name               | Busy | Op        | Fi         | Fj | Fk | Qj  | Qk  | Rj | Rk |
| Int1                    | No   |           |            |    |    |     |     |    |    |

|          |     |       |     | J   |     | ~     | <br>J |     |  |
|----------|-----|-------|-----|-----|-----|-------|-------|-----|--|
| Int1     | No  |       |     |     |     |       |       |     |  |
| Int2     | No  |       |     |     |     |       |       |     |  |
| 10 Mult1 | Yes | Multd | P36 | P34 | P4  |       | Yes   | Yes |  |
| 2 Add    | Yes | Sub   | P38 | P32 | P34 |       | Yes   | Yes |  |
| Divide   | Yes | Divd  | P40 | P36 | P32 | Mult1 | No    | Yes |  |

#### Register Rename and Result

| Clock |    | F0  | F2  | <i>F4</i> | <i>F6</i> | F8  | F10 | F12 | ••• | F30 |
|-------|----|-----|-----|-----------|-----------|-----|-----|-----|-----|-----|
| 6     | FU | P36 | P34 | P4        | P32       | P38 | P40 | P12 |     | P30 |

#### Renamed Scoreboard 7

| Instruction | n sta | tus:             |       |      | Read | Exec   | Write |
|-------------|-------|------------------|-------|------|------|--------|-------|
| Instructio  | j     | $\boldsymbol{k}$ | Issue | Oper | Comp | Result |       |
| LD          | F6    | 34+              | R2    | 1    | 2    | 3      | 4     |
| LD          | F2    | 45+              | R3    | 2    | 3    | 4      | 5     |
| MULTD       | F0    | F2               | F4    | 3    | 6    |        |       |
| SUBD        | F8    | F6               | F2    | 4    | 6    |        |       |
| DIVD        | F10   | F0               | F6    | 5    |      |        |       |
| ADDD        | F6    | F8               | F2    |      |      |        |       |

| Functional unit status. | dest | SI | <i>S2</i> | FU | FU | Fj? | Fk? |    |    |
|-------------------------|------|----|-----------|----|----|-----|-----|----|----|
| Time Name               | Busy | Op | Fi        | Fj | Fk | Qj  | Qk  | Rj | Rk |
| Int1                    | No   |    |           |    |    |     |     |    |    |

| Int1    | No  |       |     |     |     |       |     |     |
|---------|-----|-------|-----|-----|-----|-------|-----|-----|
| Int2    | No  |       |     |     |     |       |     |     |
| 9 Mult1 | Yes | Multd | P36 | P34 | P4  |       | Yes | Yes |
| 1 Add   | Yes | Sub   | P38 | P32 | P34 |       | Yes | Yes |
| Divide  | Yes | Divd  | P40 | P36 | P32 | Mult1 | No  | Yes |

#### Register Rename and Result

| Clock |    | F0  | <i>F</i> 2 | F4 | <i>F6</i> | F8  | F10 | F12 | ••• | F30 |
|-------|----|-----|------------|----|-----------|-----|-----|-----|-----|-----|
| 7     | FU | P36 | P34        | P4 | P32       | P38 | P40 | P12 |     | P30 |

#### Renamed Scoreboard 8

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instruction | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
| MULTD       | F0    | F2   | F4 | 3     | 6    |      |        |
| SUBD        | F8    | F6   | F2 | 4     | 6    | 8    |        |
| DIVD        | F10   | F0   | F6 | 5     |      |      |        |
| ADDD        | F6    | F8   | F2 |       |      |      |        |

#### Functional unit status:

| ditti sidilis. |      |       | ucsi | $\mathcal{D}I$ | 02  | 10    | 10 | IJ. | 1 /. |  |
|----------------|------|-------|------|----------------|-----|-------|----|-----|------|--|
| Time Name      | Busy | Op    | Fi   | Fj             | Fk  | Qj    | Qk | Rj  | Rk   |  |
| Int1           | No   |       |      |                |     |       |    |     |      |  |
| Int2           | No   |       |      |                |     |       |    |     |      |  |
| 8 Mult1        | Yes  | Multd | P36  | P34            | P4  |       |    | Yes | Yes  |  |
| 0 Add          | Yes  | Sub   | P38  | P32            | P34 |       |    | Yes | Yes  |  |
| Divide         | Yes  | Divd  | P40  | P36            | P32 | Mult1 |    | No  | Yes  |  |

\$1

FII

FII

Fi?

Fk2

#### Register Rename and Result

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | F10 | F12 | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|-----|-----|-----|-----|
| 8     | FU | P36 | P34        | P4        | P32       | P38 | P40 | P12 |     | P30 |

dost

#### Renamed Scoreboard 9

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instruction | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
| MULTD       | F0    | F2   | F4 | 3     | 6    |      |        |
| SUBD        | F8    | F6   | F2 | 4     | 6    | 8    | 9      |
| DIVD        | F10   | F0   | F6 | 5     |      |      |        |
| ADDD        | F6    | F8   | F2 |       |      |      |        |

#### Functional unit status:

| control Secretors. |      |       | CCSI | <i>D</i> 1 | 2   | 1 0   | 1 0 | <b>1</b> J· | 1 70. |  |
|--------------------|------|-------|------|------------|-----|-------|-----|-------------|-------|--|
| Time Name          | Busy | Op    | Fi   | Fj         | Fk  | Qj    | Qk  | Rj          | Rk    |  |
| Int1               | No   |       |      |            |     |       |     |             |       |  |
| Int2               | No   |       |      |            |     |       |     |             |       |  |
| 7 Mult1            | Yes  | Multd | P36  | P34        | P4  |       |     | Yes         | Yes   |  |
| Add                | No   |       |      |            |     |       |     |             |       |  |
| Divide             | Yes  | Divd  | P40  | P36        | P32 | Mult1 |     | No          | Yes   |  |

#### Register Rename and Result

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 9     | FU | P36 | P34        | P4        | P32       | P38 | P40        | P12        |     | P30 |

dest S1

Fk?

#### Renamed Scoreboard 10

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instruction | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
| MULTD       | F0    | F2   | F4 | 3     | 6    |      |        |
| SUBD        | F8    | F6   | F2 | 4     | 6    | 8    | 9      |
| DIVD        | F10   | F0   | F6 | 5     |      |      |        |
| ADDD        | F6    | F8   | F2 | 10    |      |      |        |

| T                 | • .    |         |
|-------------------|--------|---------|
| <b>Functional</b> | 111011 | ctatuc  |
| Tulllional        | ullul  | DIGIUS. |

| Time Name | Busy | Op    | Fi             | Fj          | Fk         | Qj     | Qk    | Řj    | Rk  |
|-----------|------|-------|----------------|-------------|------------|--------|-------|-------|-----|
| Int1      | No   |       |                |             |            |        |       |       |     |
| Int2      | No   |       |                |             | W          | 'AR Ho | azard | gone! |     |
| 6 Mult1   | Yes  | Multd | P36            | <b>P</b> 34 | <b>R</b> 4 |        |       | Yes   | Yes |
| Add       | Yes  | Addd  | P42            | P38         | P. 4       |        |       | Yes   | Yes |
| Divide    | Yes  | Divd  | <del>P40</del> | P36         | P32        | Mult1  |       | No    | Yes |
|           |      |       |                |             |            |        |       |       |     |

SI

FU

FU

Fi?

Fk?

#### Register Rename and Result

Clock F0 F2 F4 F6 F8 F10 F12 ... F30

10 FU P36 P34 P4 P42 P38 P40 P12 P30

dest

Notice that P32 not listed in Rename Table Still live. Must not be reallocated by accident

#### Renamed Scoreboard 11

| Instruction | n sta | tus:      |            |       | Read | Exec | Write  |
|-------------|-------|-----------|------------|-------|------|------|--------|
| Instruction | n     | $\dot{j}$ | k          | Issue | Oper | Comp | Result |
| LD          | F6    | 34+       | R2         | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+       | <b>R</b> 3 | 2     | 3    | 4    | 5      |
| MULTD       | F0    | F2        | F4         | 3     | 6    |      |        |
| SUBD        | F8    | F6        | F2         | 4     | 6    | 8    | 9      |
| DIVD        | F10   | F0        | F6         | 5     |      |      |        |
| ADDD        | F6    | F8        | F2         | 10    | 11   |      |        |

| Functional unit status: | •    |    | dest | SI | <i>S2</i> | FU | FU | Fj? | Fk? |
|-------------------------|------|----|------|----|-----------|----|----|-----|-----|
| Time Name               | Busy | Op | Fi   | Fj | Fk        | Qj | Qk | Rj  | Rk  |
| Int1                    | No   |    |      |    |           |    |    |     |     |
| Int2                    | No   |    |      |    |           |    |    |     |     |

| NO  |       |     |     |     |       |     |     |
|-----|-------|-----|-----|-----|-------|-----|-----|
| Yes | Multd | P36 | P34 | P4  |       | Yes | Yes |
| Yes | Addd  | P42 | P38 | P34 |       | Yes | Yes |
| Yes | Divd  | P40 | P36 | P32 | Mult1 | No  | Yes |

#### Register Rename and Result

5 Mult12 Add

Divide

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | <i>F10</i> | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|------------|------------|-----|-----|
| 11    | FU | P36 | P34        | P4        | P42       | P38 | P40        | P12        |     | P30 |

#### Renamed Scoreboard 12

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
| MULTD       | F0    | F2   | F4 | 3     | 6    |      |        |
| SUBD        | F8    | F6   | F2 | 4     | 6    | 8    | 9      |
| DIVD        | F10   | F0   | F6 | 5     |      |      |        |
| ADDD        | F6    | F8   | F2 | 10    | 11   |      |        |

Yes

Yes

Addd

Divd

| Functional unit status: |      |       | dest | SI  | <i>S2</i> | FU | FU | Fj? | Fk? |
|-------------------------|------|-------|------|-----|-----------|----|----|-----|-----|
| Time Name               | Busy | Op    | Fi   | Fj  | Fk        | Qj | Qk | Rj  | Rk  |
| Int1                    | No   |       |      |     |           |    |    |     |     |
| Int2                    | No   |       |      |     |           |    |    |     |     |
| 4 Mult1                 | Yes  | Multd | P36  | P34 | P4        |    |    | Yes | Yes |

P42

P40

P38

P36

P34

P32

Mult1

#### Register Rename and Result

1 Add

Divide

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | F10 | F12 | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|-----|-----|-----|-----|
| 12    | FU | P36 | P34        | P4        | P42       | P38 | P40 | P12 |     | P30 |

Yes

No

Yes

Yes

#### Renamed Scoreboard 13

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instructio  | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
| MULTD       | F0    | F2   | F4 | 3     | 6    |      |        |
| SUBD        | F8    | F6   | F2 | 4     | 6    | 8    | 9      |
| DIVD        | F10   | F0   | F6 | 5     |      |      |        |
| ADDD        | F6    | F8   | F2 | 10    | 11   | 13   |        |

| T . • 1           | • ,    |         |
|-------------------|--------|---------|
| Hunctional        | 111/11 | ctatuc. |
| <b>Functional</b> | ulli   | siains. |

| Time | Name   |  |
|------|--------|--|
|      | Int1   |  |
|      | Int2   |  |
| 3    | Mult1  |  |
| 0    | Add    |  |
|      | Divide |  |

| • |      |       | aest | SI  | 32  | FU    | FU | FJ? | FK! |  |
|---|------|-------|------|-----|-----|-------|----|-----|-----|--|
|   | Busy | Op    | Fi   | Fj  | Fk  | Qj    | Qk | Rj  | Rk  |  |
|   | No   |       |      |     |     |       |    |     |     |  |
|   | No   |       |      |     |     |       |    |     |     |  |
|   | Yes  | Multd | P36  | P34 | P4  |       |    | Yes | Yes |  |
|   | Yes  | Addd  | P42  | P38 | P34 |       |    | Yes | Yes |  |
|   | Yes  | Divd  | P40  | P36 | P32 | Mult1 |    | No  | Yes |  |

#### Register Rename and Result

| Clock |
|-------|
| 13    |

|    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | F10 | F12 | ••• | F30 |
|----|-----|------------|-----------|-----------|-----|-----|-----|-----|-----|
| FU | P36 | P34        | P4        | P42       | P38 | P40 | P12 |     | P30 |

T-1-9

#### Renamed Scoreboard 14

| Instruction | n sta | tus: |            | Read  | Exec | Write |        |
|-------------|-------|------|------------|-------|------|-------|--------|
| Instruction | n     | j    | k          | Issue | Oper | Comp  | Result |
| LD          | F6    | 34+  | R2         | 1     | 2    | 3     | 4      |
| LD          | F2    | 45+  | <b>R</b> 3 | 2     | 3    | 4     | 5      |
| MULTD       | F0    | F2   | F4         | 3     | 6    |       |        |
| SUBD        | F8    | F6   | F2         | 4     | 6    | 8     | 9      |
| DIVD        | F10   | F0   | <b>F6</b>  | 5     |      |       |        |
| ADDD        | F6    | F8   | F2         | 10    | 11   | 13    | 14     |

| Functional unit status: |      |       | dest | SI  | <i>S</i> 2 | FU    | FU | Fj? | Fk? |
|-------------------------|------|-------|------|-----|------------|-------|----|-----|-----|
| Time Name               | Busy | Op    | Fi   | Fj  | Fk         | Qj    | Qk | Rj  | Rk  |
| Int1                    | No   |       |      |     |            |       |    |     |     |
| Int2                    | No   |       |      |     |            |       |    |     |     |
| 2 Mult1                 | Yes  | Multd | P36  | P34 | P4         |       |    | Yes | Yes |
| Add                     | No   |       |      |     |            |       |    |     |     |
| Divide                  | Yes  | Divd  | P40  | P36 | P32        | Mult1 |    | No  | Yes |

#### Register Rename and Result

| Clock |    | F0  | <i>F</i> 2 | F4 | <i>F6</i> | F8  | F10 | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|----|-----------|-----|-----|------------|-----|-----|
| 14    | FU | P36 | P34        | P4 | P42       | P38 | P40 | P12        |     | P30 |

#### Renamed Scoreboard 15

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instruction | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
| MULTD       | F0    | F2   | F4 | 3     | 6    |      |        |
| SUBD        | F8    | F6   | F2 | 4     | 6    | 8    | 9      |
| DIVD        | F10   | F0   | F6 | 5     |      |      |        |
| ADDD        | F6    | F8   | F2 | 10    | 11   | 13   | 14     |

Yes

Divd

| Functional unit status: | dest | SI    | <i>S</i> 2 | FU  | FU | Fj? | Fk? |     |     |
|-------------------------|------|-------|------------|-----|----|-----|-----|-----|-----|
| Time Name               | Busy | Op    | Fi         | Fj  | Fk | Qj  | Qk  | Rj  | Rk  |
| Int1                    | No   |       |            |     |    |     |     |     |     |
| Int2                    | No   |       |            |     |    |     |     |     |     |
| 1 Mult1                 | Yes  | Multd | P36        | P34 | P4 |     |     | Yes | Yes |
| Add                     | No   |       |            |     |    |     |     |     |     |

P40

P36

P32

Mult1

#### Register Rename and Result

Divide

Clock F0*F2* F4 *F6* F8 F10 *F12* F30 **15** P34 P30 P4 P42 P38 P40 P12

Yes

No

#### Renamed Scoreboard 16

| Instruction | n sta | tus: |    |       | Read | Exec | Write  |
|-------------|-------|------|----|-------|------|------|--------|
| Instruction | n     | j    | k  | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
| MULTD       | F0    | F2   | F4 | 3     | 6    | 16   |        |
| SUBD        | F8    | F6   | F2 | 4     | 6    | 8    | 9      |
| DIVD        | F10   | F0   | F6 | 5     |      |      |        |
| ADDD        | F6    | F8   | F2 | 10    | 11   | 13   | 14     |

#### Functional unit status:

| ann sians. |      |       | aesi | $\mathcal{S}I$ | 32  | $\Gamma U$ | $\Gamma U$ | $\Gamma J$ : | $\Gamma K$ : |  |
|------------|------|-------|------|----------------|-----|------------|------------|--------------|--------------|--|
| Time Name  | Busy | Op    | Fi   | Fj             | Fk  | Qj         | Qk         | Rj           | Rk           |  |
| Int1       | No   |       |      |                |     |            |            |              |              |  |
| Int2       | No   |       |      |                |     |            |            |              |              |  |
| 0 Mult1    | Yes  | Multd | P36  | P34            | P4  |            |            | Yes          | Yes          |  |
| Add        | No   |       |      |                |     |            |            |              |              |  |
| Divide     | Yes  | Divd  | P40  | P36            | P32 | Mult1      |            | No           | Yes          |  |

#### Register Rename and Result

| Clock |    | F0  | <i>F</i> 2 | F4 | <i>F6</i> | F8  | F10 | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|----|-----------|-----|-----|------------|-----|-----|
| 16    | FU | P36 | P34        | P4 | P42       | P38 | P40 | P12        |     | P30 |

#### Renamed Scoreboard 17

| In | struction  | n sta | tus: |    |       | Read | Exec | Write  |
|----|------------|-------|------|----|-------|------|------|--------|
|    | Instructio | n     | j    | k  | Issue | Oper | Comp | Result |
|    | LD         | F6    | 34+  | R2 | 1     | 2    | 3    | 4      |
|    | LD         | F2    | 45+  | R3 | 2     | 3    | 4    | 5      |
|    | MULTD      | F0    | F2   | F4 | 3     | 6    | 16   | 17     |
|    | SUBD       | F8    | F6   | F2 | 4     | 6    | 8    | 9      |
|    | DIVD       | F10   | F0   | F6 | 5     |      |      |        |
|    | ADDD       | F6    | F8   | F2 | 10    | 11   | 13   | 14     |
|    |            |       |      |    |       |      |      |        |

No

Yes

| Functional unit status: | dest | SI | <i>S</i> 2 | FU | FU | Fj? | Fk? |    |    |
|-------------------------|------|----|------------|----|----|-----|-----|----|----|
| Time Name               | Busy | Op | Fi         | Fj | Fk | Qj  | Qk  | Rj | Rk |
| Int1                    | No   |    |            |    |    |     |     |    |    |
| Int2                    | No   |    |            |    |    |     |     |    |    |
| Mult1                   | No   |    |            |    |    |     |     |    |    |

P40

Divd

P36

P32

Mult1

Yes

Yes

#### Register Rename and Result

Add

Divide

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | F10 | <i>F12</i> | • • • | F30 |
|-------|----|-----|------------|-----------|-----------|-----|-----|------------|-------|-----|
| 17    | FU | P36 | P34        | P4        | P42       | P38 | P40 | P12        |       | P30 |

#### Renamed Scoreboard 18

| Instruction | n sta | tus: |                  |       | Read | Exec | Write  |
|-------------|-------|------|------------------|-------|------|------|--------|
| Instructio  | n     | j    | $\boldsymbol{k}$ | Issue | Oper | Comp | Result |
| LD          | F6    | 34+  | R2               | 1     | 2    | 3    | 4      |
| LD          | F2    | 45+  | R3               | 2     | 3    | 4    | 5      |
| MULTD       | F0    | F2   | F4               | 3     | 6    | 16   | 17     |
| SUBD        | F8    | F6   | F2               | 4     | 6    | 8    | 9      |
| DIVD        | F10   | F0   | F6               | 5     | 18   |      |        |
| ADDD        | F6    | F8   | F2               | 10    | 11   | 13   | 14     |

| 77      | 7    | •         |           |
|---------|------|-----------|-----------|
| Hunctio | าทสโ | 11.11.1.1 | t status: |

| t thirth sterring. |      |      | cecse | ~ 1 | ~ = | 1 0   | • • | <b>-</b> J • | 1 / . |  |
|--------------------|------|------|-------|-----|-----|-------|-----|--------------|-------|--|
| Time Name          | Busy | Op   | Fi    | Fj  | Fk  | Qj    | Qk  | Rj           | Rk    |  |
| Int1               | No   |      |       |     |     |       |     |              |       |  |
| Int2               | No   |      |       |     |     |       |     |              |       |  |
| Mult1              | No   |      |       |     |     |       |     |              |       |  |
| Add                | No   |      |       |     |     |       |     |              |       |  |
| 40 Divide          | Yes  | Divd | P40   | P36 | P32 | Mult1 |     | Yes          | Yes   |  |

#### Register Rename and Result

| Clock |    | F0  | <i>F</i> 2 | <i>F4</i> | <i>F6</i> | F8  | F10 | <i>F12</i> | ••• | F30 |
|-------|----|-----|------------|-----------|-----------|-----|-----|------------|-----|-----|
| 18    | FU | P36 | P34        | P4        | P42       | P38 | P40 | P12        |     | P30 |

Fk?

## Register renaming vs. ROB

- Instruction commit simpler than with ROB;
- Deallocating registers more complex;
- Dynamic mapping of architectural to physical registers complicates design and debugging;
- Used in PowerPC603/604, Pentium II-III-4,
   MIPS 10000/12000, Alpha 21264; Sandy-Bridge
  - 20 to 80 registers are added.

## Summary

- Explicit Renaming: more physical registers than needed by ISA.
  - Rename table: tracks current association between architectural registers and physical registers
  - Uses a translation table to perform compiler-like transformation on the fly
- With Explicit Renaming:
  - All registers concentrated in single register file
  - Can utilize bypass network that looks more like 5-stage pipeline
  - Introduces a register-allocation problem
    - Need to handle branch misprediction and precise exceptions differently, but ultimately makes things simpler
- For precise exceptions and branch prediction:
  - Clearly need something like reorder buffer

We want to go beyond CPI=1, this can only be done with multiple issue

## Multiple issue

- Necessary to have the issue logic to handle two or more instructions at once, including possible dependences between instructions
- Most fundamental bottleneck in dynamically scheduled superscalar processors
  - Need logic to handle issuing every possible combination of dependent instructions in the same clock cycle
  - Since the number of possibilities increases with the square of the number of instructions that can be issued in one clock cycle, difficult to implement issue logic for more than 4 instructions

## Basic strategy for multiple issue

- Assign a reservation station and a reorder buffer entry for every instruction in the next issue bundle
  - if not available only a subset of instructions is considered in sequential order
- Analyze all dependences among the instructions
- If an instruction in the bundle depends on an earlier instruction of the same bundle, use the assigned reorder buffer number to update the reservation table for the dependent instruction.

#### All this is done in parallel in a single clock cycle!

- In addition we need to be able to commit multiple instructions in a single clock cycle
- Intel I7 uses a similar technique



# Advanced Computer Architectures Multiple instructions issue with register renaming

- The issue logic pre-reserves enough physical registers for the entire issue bundle
- The issue logic determines what dependences exist in the bundle.
  - If a dependence does not exist within the bundle, the register renaming structure is used to determine the physical register that holds, or will hold, the result on which instruction depends. The result is from an earlier issue bundle, and the register renaming table will have the correct register number
  - If an instruction depends on an instruction that is earlier in the bundle, then the pre-reserved physical register in which the result will be placed is used to update the information for the issuing instruction

All this is done in parallel in a single clock cycle!



### Superscalar Register Renaming: two-issue

- During decode, instructions allocated new physical destination register
- Source operands renamed to physical register with newest value
- Execution unit only sees physical register numbers





MIPS R10K renames 4 serially-RAW-dependent insts/cycle

## Speculation and energy efficiency

- Speculation raises power consumption but lowers execution time by more than it increases the average power consumption
  - Total energy consumed may be less depending on the number of instructions incorrectly executed
  - Experimental results show that in scientific code misspeculation is on average small while it is significant (30% on average) in integer code

## Summary

- Modern computer architects predict everything:
  - Branches/Data Dependencies/Data!
- Fairly simple hardware structures can do a good job of predicting branches:
  - Branch Target Buffer (BTB) identifies branches and branch offsets
  - Branch History Table (BHT) does prediction
- More Sophisticated prediction: Correlation
  - Different branches depend on one another!

# Summary

- Explicit Renaming: more physical registers than ISA
  - Separates renaming from scheduling
    - Opens up lots of options for resolving RAW hazards
  - Rename table: tracks current association between architectural registers and physical registers
  - Potentially complicated rename table management
- Parallelism hard to get from real hardware