

# 计算机体系结构

周学海 xhzhou@ustc.edu.cn 0551-63492149 中国科学技术大学



### Review

#### ・ 基于BHT表的预测器:

- Basic 2-bit predictor:
- Global predictor:
  - 每个分支对应多个m-bit预测器
  - 最近n次的分支转移的每一种情况分别对应其中一个预测器
- Local predictor:
  - 每个分支对应多个m-bit预测器
  - 该分支最近n次分支转移的每一种情况分别对应其中一个预测器
- Tournament predictor:
  - 从多种预测器的预测结果中选择合适的预测结果。
  - 例如: 两级全局预测器与两级局部预测器
- 优化取指令的带宽
  - 基于BTB的分支预测器
  - Return Address Stack
  - 集成的独立的取指部件



### BHT预测器的基本结构及输入输出



- · 根据转移历史(和PC)来选择预测器
- 由预测器的状态决定预测值(输出)
- · 根据实际结果(outcomes)更新预测器的状态信息



### 5.4 分支预测技术

# 控制相关对性能的影响

#### 基于BHT的 分支预测

### 基于BTB的 分支预测

- 1、基本2-bit预测器
- 2、关联预测器(两级预测器)
- 3、组合预测器

- 1、分支目标缓冲区
- 2、Return Address预测器

预测分支方向

预测目标地址



### Branch Target Buffer (BTB)

- BTB 小容量的Cache
- · 分支指令的地址作为BTB的索引,以得到分支预测地址
  - 必须检测分支指令的地址是否匹配,以免用错误的分支地址
  - 从表中得到预测地址
  - 分支方向确定后,更新预测的PC





# BTB的组织



- · BTB本质上是Cache
- · 可以有多种组织方式,代价和性能不同
  - 直接映像方式
  - 组相联方式
- · 面向BTB的Cache组织优化
  - 例如:缩短Tag的位数 (存储tag的部分位数,或通过运算缩短Tag的位数









例如:基本模型

- 简单的五段流水
- · ID段 确认 是否 可以 跳转
- BTB预测器 分支目标缓存的换 入换出

Figure 3.22 The steps involved in handling an instruction with a branch-target buffer.



| Instruction in buffer | Prediction | Actual branch | Penalty cycles |
|-----------------------|------------|---------------|----------------|
| yes                   | taken      | taken         | 0              |
| yes                   | taken      | not taken     | 2              |
| no                    |            | taken         | 2              |
| no                    |            | not taken     | 0              |

Figure 2.24 Penalties for all possible combinations of whether the branch is in the buffer and what it actually does, assuming we store only taken branches in the buffer. There is no branch penalty if everything is correctly predicted and the branch is found in the target buffer. If the branch is not correctly predicted, the penalty is equal to 1 clock cycle to update the buffer with the correct information (during which an instruction cannot be fetched) and 1 clock cycle, if needed, to restart fetching the next correct instruction for the branch. If the branch is not found and taken, a 2-cycle penalty is encountered, during which time the buffer is updated.

Branch Penalty: 如果在BTB中命中,并且预测正确,则Penalty为0,其他情况则Penalty为2





图 4.37 BTB miss 时,暂停流水线会引入气泡

假设不同情况下预测错误的代价如上图,请基于如下条件确定采用BTB分支预测预测错误的总开销。

- 对于在BTB中命中的分支指令,分支预测转移成功的准确率 (精度) 为90%
- 分支预测转移成功指令在BTB中的命中的比率为 90%



### Return Address Predictors

- 投机执行面临的挑战: 预测间接跳转
  - 运行时才能确定分支目标地址
- · 多数间接跳转来源于Procedure Return
  - 采用BTB时,对于过程返回的预测精度 较低
  - SPEC CPU95测试,这类分支预测的准确性不到60%
- · 使用一个小的缓存(栈) 存放 Return Address
  - 过程调用时将返回地址压入该栈
  - 过程返回时通过弹栈操作获得转移地址





### Return Address Predictors 举例





图 4.41 对 CALL/Return 指令进行分支预测



4.40 执行三条 CALL 指令之后, RAS 中的值



图 4.42 将指令的类型存储在 BTB 中

BTB



### Return Address Buffer entries



**Figure 3.24** Prediction accuracy for a return address buffer operated as a stack on a number of SPEC CPU95 benchmarks. The accuracy is the fraction of return addresses predicted correctly. A buffer of 0 entries implies that the standard branch prediction is used. Since call depths are typically not large, with some exceptions, a modest buffer works well. These data come from Skadron et al. [1999] and use a fix-up mechanism to prevent corruption of the cached return addresses.

#### • 返回栈 (Return Address Buffer)中表项数 (entries)与预测精度的关系



### 其他预测间接跳转目标地址方法

#### Case (a)

1: 跳转到目标地址1

2: 跳转到目标地址2

3: 跳转到目标地址3

••••

9: 跳转到目标地址1



图 4.48 使用基于局部历史的分支预测方法对目标地址进行预测



### Instruction Fetch Unit



FIGURE 3.1: Example fetch pipeline.



# 分支预测小结



图 4.49 一种完整的分支预测方法



### 第5章 指令级并行

#### 5.1 指令级并行的基本概念及静态指令流调度

ILP及挑战性问题 软件方法挖掘指令集并行 基本块内的指令集并行

#### 5.2硬件方法挖掘指令级并行(4学时)

5.2-1 指令流动态调度方法之一: Scoreboard

5.2-2 指令流动态调度方法之二: Tomasulo

#### 5.3 分支预测方法

5.4 基于硬件的推测执行: 3.6

5.5-1 存储器访问冲突消解

5.5-2 多发射技术

5.6 多线程技术



### 5.4 推断执行

#### 支持推断执行 的Tomasulo

### 代码执行 示例

### Tomasulo 小结

- 1. 带有ROB的机器结构
- 2. 四阶段算法描述

- 1. 简单代码示例
- 2. 推断执行示例

- 1. ROB的作用
- 2. 动态内存歧义消除

#### 分支预测失败时的恢复



### 使用Tomasulo算法,支持推断执行的基本结构



#### 主要差异:

- 增加了Reorder buffer
- · 删除了store buffer,其 功能集成在ROB中

Figure 3.15 The basic structure of a FP unit using Tomasulo's algorithm and extended to handle speculation. Comparing this to Figure 3.10 on page 198, which implemented Tomasulo's algorithm, we can see that the major change is the addition of the ROB and the elimination of the store buffer, whose function is integrated into the ROB. This mechanism can be extended to allow multiple issues per clock by making the CDB wider to allow for multiple completions per clock.

Xhzhou@USTC

18



### 硬件支持推断执行以及精确异常

- · 支持推断执行的条件: 具有"恢复"能力
- ・ 硬件缓存没有提交的指令结果: reorder buffer (ROB)
  - 4 个域: 指令类型,目的地址, 值, ready域
  - Reorder buffer 可以作为操作数源 => 就像有更多的寄存器(与RS类似)
  - 当指令执行阶段完成后,用ROB的编号代替RS中的值
  - 增加指令提交阶段 (Commit)
  - ROB提供执行完成阶段和提交阶段的操作数
  - 一旦结果提交, 结果就写入寄存器
  - 在预测错误时,容易恢复推断执行的指令,或发生异常时,容易恢复状态

19



### 支持推断执行的 Tomasulo 算法的四阶段

#### 1. Issue—get instruction from FP Op Queue

- 如果RS和ROB有空闲单元就发射指令。如果寄存器或ROB中源操作数可用,就将其发送到RS,目的地址的ROB编号也发送给RS
- 2. Execution—operate on operands (EX)
  - 当操作数就绪后,开始执行。如果没有就绪,监测CDB,检查 RAW相关
- 3. Write result—finish execution (WB)
  - 将运算结果通过CDB传送给所有等待结果的FU以及ROB单元,标识RS可用
- 4. Commit—update register with reorder result
  - 按ROB表中顺序,如果结果已有,就更新寄存器(或存储器), 并将该指令从ROB表中删除
  - 预测错误或有异常 (中断) 时, 刷新ROB
  - P191 Figure 3.14 (英文版), P141 Figure 3-9 (中文版)
- · 执行过程中需要检测CDB冲突



### Issue

| Status                       | Wait until                        | Action or bookkeeping                                                                                                                                                                                                                                                                                                                                                                                          |
|------------------------------|-----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Issue<br>all<br>instructions | Reservation<br>station (r)<br>and | <pre>if (RegisterStat[rs].Busy)/*in-flight instr. writes rs*/     {h ← RegisterStat[rs].Reorder;     if (ROB[h].Ready)/* Instr completed already */         {RS[r].Vj ← ROB[h].Value; RS[r].Qj ← 0;}     else {RS[r].Qj ← h;} /* wait for instruction */ } else {RS[r].Vj ← Regs[rs]; RS[r].Qj ← 0;}; RS[r].Busy ← yes; RS[r].Dest ← b; ROB[b].Instruction ← opcode; ROB[b].Dest ← rd;ROB[b].Ready ← no;</pre> |
| FP operations and stores     | ROB (b)<br>both available         | <pre>if (RegisterStat[rt].Busy) /*in-flight instr writes rt*/    {h ← RegisterStat[rt].Reorder;    if (ROB[h].Ready)/* Instr completed already */       {RS[r].Vk ← ROB[h].Value; RS[r].Qk ← 0;}    else {RS[r].Qk ← h;} /* wait for instruction */ } else {RS[r].Vk ← Regs[rt]; RS[r].Qk ← 0;};</pre>                                                                                                         |
| FP operations                |                                   | RegisterStat[rd].Reorder ← b; RegisterStat[rd].Busy ← yes; ROB[b].Dest ← rd;                                                                                                                                                                                                                                                                                                                                   |
| Loads                        |                                   | RS[r].A ← imm; RegisterStat[rt].Reorder ← b; RegisterStat[rt].Busy ← yes; ROB[b].Dest ← rt;                                                                                                                                                                                                                                                                                                                    |
| Stores                       |                                   | RS[r].A ← imm;                                                                                                                                                                                                                                                                                                                                                                                                 |

rs: FP操作指令源操作数寄存器, Load/store指令的基址寄存器

rt: FP指令的源操作数寄存器,store操作的待写入的寄存器,load操作的目的寄存器

h: ROB中当前指令所依赖的指令对应的ROB编号;

b: 当前指令对应的ROB编号; r:当前指令对应的保留站编号



### Execute

| Execute<br>FP op | (RS[r].Qj == 0) and $(RS[r].Qk == 0)$                                          | Compute results—operands are in Vj and Vk |  |
|------------------|--------------------------------------------------------------------------------|-------------------------------------------|--|
| Load step 1      | (RS[r].Qj == 0) and<br>there are no stores<br>earlier in the queue             | $RS[r].A \leftarrow RS[r].Vj + RS[r].A;$  |  |
| Load step 2      | Load step 1 done<br>and all stores earlier<br>in ROB have<br>different address | Read from Mem[RS[r].A]                    |  |
| Store            | (RS[r].Qj == 0) and<br>store at queue head                                     | ROB[h].Address ← RS[r].Vj + RS[r].A;      |  |

h: store对应的ROB entry编号 是store操作队列的head



### Write result & Commit

```
Write result
             Execution done at r
                                   b \leftarrow RS[r].Dest; RS[r].Busy \leftarrow no;
                                   \forall x (if (RS[x].Qj==b) \{RS[x].Vj \leftarrow result; RS[x].Qj \leftarrow 0\}); \forall x (if (RS[x].Qk==b) \{RS[x].Vk \leftarrow result; RS[x].Qk \leftarrow 0\});
              and CDB available
all but store
                                   ROB[b].Value ← result; ROB[b].Ready ← yes;
              Execution done at r
                                   ROB[h].Value \leftarrow RS[r].Vk;
Store
                                                                            h: store对应的ROB entry编号
              and (RS[r].Qk ==
              0)
                                   d ← ROB[h].Dest; /* register dest, if exists */
Commit
              Instruction is at the
                                   if (ROB[h].Instruction==Branch)
              head of the ROB
                                      {if (branch is mispredicted)
              (entry h) and
                                        {clear ROB[h], RegisterStat; fetch branch dest;};}
              ROB[h].ready ==
                                            (ROB|h|.Instruction==Store)
              yes
                                             {Mem[ROB[h].Destination] ← ROB[h].Value;}
                                   else /* put the result in the register destination */
                                       [Regs[d] \leftarrow ROB[h].Value; \};
 h: head of the ROB entry
                                   ROB[h].Busy ← no; /* free up ROB entry */
                                   /* free up dest register if no one else writing it */
                                   if (RegisterStat[d].Reorder==h) {RegisterStat[d].Busy ← no;};
```



### 5.4 推断执行

#### 支持推断执行 的Tomasulo

### 代码执行 示例

#### Tomasulo 小结

- 1. 带有ROB的机器结构
- 2. 四阶段算法描述

- 1. 简单代码示例
- 2. 推断执行示例

- 1. ROB的作用
- 2. 动态内存歧义消除



### 例如:

LD F6, 34(R2)
LD F2, 45(R3)
MULT F0, F2, F4
SUBD F8, F6, F2
DIVD F10, F0, F6
ADDD F6, F8, F2

假设: 执行阶段的周期数

LD: 1 cycles MULT: 10 cycles

SUBD/ADDD: 2cycles DIVD: 40 cycles



| Time | Name  | Busy | Ор | Vj | Vk | Qj | Qk | Dest |             |
|------|-------|------|----|----|----|----|----|------|-------------|
| 0    | Add1  | No   |    |    |    |    |    |      | Reservation |
| 0    | Add2  | No   |    |    |    |    |    |      | Station     |
| 0    | Add3  | No   |    |    |    |    |    |      |             |
| 0    | Mult1 | No   |    |    |    |    |    |      |             |
| 0    | Mult2 | No   |    |    |    |    |    |      |             |
|      |       |      |    |    |    |    |    |      |             |

LD F6, 34(R2) LD F2, 45(R3) MULT F0, F2, F4 SUBD F8, F6, F2 DIVD F10, F0, F6 ADDD F6, F8, F2

| _ Load1 | Value | Destination | State | Instruction | Busy | Entry |
|---------|-------|-------------|-------|-------------|------|-------|
| Load2   |       |             |       |             |      | 1     |
| Load3   |       |             |       |             |      | 2     |
|         |       |             |       |             |      | 3     |
|         |       |             |       |             |      | 4     |
|         |       |             |       |             |      | 5     |
|         |       |             |       |             |      | 6     |
| F       |       |             |       |             |      | 7     |
|         |       |             |       |             |      | 8     |
|         |       |             |       |             |      | 9     |
|         |       |             |       |             |      | 10    |
| J       |       |             |       |             |      |       |

Reorder Buffer

Busy

Address

Cycle

|   | _        | FU | FΖ | F4 | РO | ۲8 | FIU | F12 | ••••• | F30 |
|---|----------|----|----|----|----|----|-----|-----|-------|-----|
| 0 | Reorder# |    |    |    |    |    |     |     |       |     |
|   | Busy     | No | No | No | No | No | No  | No  |       | No  |

假设: 执行阶段的周期数

LD: 1 cycles MULT: 10 cycles SUBD/ADDD: 2 cycles DIVD: 40 cycles



| LD: | : 1 cycles               |                                                | MULT: 10 cycles              |    |    | SUBD/ADDD: 2cycles |    |    | DIVD: 40 cycles |                        |
|-----|--------------------------|------------------------------------------------|------------------------------|----|----|--------------------|----|----|-----------------|------------------------|
|     | Time<br>0<br>0<br>0<br>0 | Name<br>Add1<br>Add2<br>Add3<br>Mult1<br>Mult2 | Busy<br>No<br>No<br>No<br>No | Ор | Vj | Vk                 | Qj | Qk | Dest            | Reservation<br>Station |

LD F6, 34(R2) Head LD F2, 45(R3) MULT F0, F2, F4 SUBD F8, F6, F2 DIVD F10, F0, F6 ADDD F6, F8, F2

| Entry | Busy | Instruction   | State | Destination | Value |
|-------|------|---------------|-------|-------------|-------|
| 1     | Yes  | LD F6, 34(R2) | Issue | F6          |       |
| 2     |      |               |       |             |       |
| 3     |      |               |       |             |       |
| 4     |      |               |       |             |       |
| 5     |      |               |       |             |       |
| 6     |      |               |       |             |       |
| 7     |      |               |       |             |       |
| 8     |      |               |       |             |       |
| 9     |      |               |       |             |       |
| 10    |      |               |       |             |       |

Busy Address
Load1 Yes 34+Regs[R2]
Load2
Load3

Reorder Buffer

 ${\tt Cyc1e}$ 

1 Reorder‡ Busy

|    | F0 | F2 | F4 | F6  | F8 | F10 | F12 | ••••• | F30 |
|----|----|----|----|-----|----|-----|-----|-------|-----|
| r# |    |    |    | #1  |    |     |     |       |     |
|    | No | No | No | Yes | No | No  | No  |       | No  |



| LD: | : 1 cycles |       |      | MULT: | 10 cycles | SUB | D/ADDD: 2 | DIVD: 40 cycles |      |             |
|-----|------------|-------|------|-------|-----------|-----|-----------|-----------------|------|-------------|
|     | Time       | Name  | Busy | Ор    | Vј        | Vk  | Qj        | Qk              | Dest |             |
|     | 0          | Add1  | No   |       |           |     |           |                 |      | Reservation |
|     | 0          | Add2  | No   |       |           |     |           |                 |      | Station     |
|     | 0          | Add3  | No   |       |           |     |           |                 |      |             |
|     | 0          | Mult1 | No   |       |           |     |           |                 |      |             |
|     | 0          | Mult2 | No   |       |           |     |           |                 |      |             |

Head LD F6, 34(R2) tail LD F2, 45(R3) MULT F0, F2, F4 SUBD F8, F6, F2 DIVD F10, F0, F6 ADDD F6, F8, F2

| Entry | Busy | Instruction    | State | Destination | Value |
|-------|------|----------------|-------|-------------|-------|
| 1     | Yes  | LD F6, 34(R2)  | Ex1   | F6          |       |
| 2     | Yes  | LD F2, 45 (R3) | Issue | F2          |       |
| 3     |      |                |       |             |       |
| 4     |      |                |       |             |       |
| 5     |      |                |       |             |       |
| 6     |      |                |       |             |       |
| 7     |      |                |       |             |       |
| 8     |      |                |       |             |       |
| 9     |      |                |       |             |       |
| 10    |      |                |       |             |       |

Busy Address
Yes 34+Regs[R2]
Yes 45+Regs[R3]

Load1

Load2 Load3

Reorder Buffer

Cycle

2 Reorder Busy

| _   | F0 | F2  | F4 | F6  | F8 | F10 | F12 | ••••• | F30 |
|-----|----|-----|----|-----|----|-----|-----|-------|-----|
| er# |    | #2  |    | #1  |    |     |     |       |     |
| 7   | No | Yes | No | Yes | No | No  | No  |       | No  |



| LD: | D: 1 cycles |              | MULT: 10 cycles |      | SUBD/ADDD: 2cycles |          |    | DIVD: 40 cycles |      |             |
|-----|-------------|--------------|-----------------|------|--------------------|----------|----|-----------------|------|-------------|
|     | Time<br>O   | Name<br>Add1 | Busy<br>No      | Ор   | Vj                 | Vk       | Qj | Qk              | Dest | Reservation |
|     | 0           | Add2         | No              |      |                    |          |    |                 |      | Station     |
|     | 0           | Add3         | No              |      |                    |          |    |                 |      |             |
|     | 0           | Mult1        | Yes             | Mu1t |                    | Regs[F4] | #2 |                 | #3   |             |
|     | 0           | Mult2        | No              |      |                    |          |    |                 |      |             |

Head LD F6, 34(R2) LD F2, 45(R3) tail MULT F0, F2, F4 SUBD F8, F6, F2 DIVD F10, F0, F6 ADDD F6, F8, F2

| Entry | Busy | Instruction     | State | Destination | Value      |
|-------|------|-----------------|-------|-------------|------------|
| 1     | Yes  | LD F6, 34(R2)   | Write | F6          | Mem[load1] |
| 2     | Yes  | LD F2, 45 (R3)  | Ex1   | F2          |            |
| 3     | Yes  | MULT F0, F2, F4 | Issue | F0          |            |
| 4     |      |                 |       |             |            |
| 5     |      |                 |       |             |            |
| 6     |      |                 |       |             |            |
| 7     |      |                 |       |             |            |
| 8     |      |                 |       |             |            |
| 9     |      |                 |       |             |            |
| 10    |      |                 |       |             |            |

Load1 No
Load2 Yes 45+Regs[R3]
Load3

Reorder Buffer

| Cyc | lе |  |
|-----|----|--|
|     |    |  |

3 Reorder# Busy

|    | F0  | F2  | F4 | F6  | F8 | F10 | F12 | ••••• | F30 |
|----|-----|-----|----|-----|----|-----|-----|-------|-----|
| ·# | #3  | #2  |    | #1  |    |     |     |       |     |
|    | Yes | Yes | No | Yes | No | No  | No  |       | No  |



| LD: 1 cycles |        |      | MULT: 10 cycles |    | SUBD/ADDD: 2cycles |    |     | DIVD: | DIVD: 40 cycles |  |
|--------------|--------|------|-----------------|----|--------------------|----|-----|-------|-----------------|--|
| Time         | n Namo | Rucy | On              | Vi | Vlz                | Λi | Olz | Doct  | 1               |  |

| Time | Name  | Busy | 0p   | Vj               | Vk               | Qj | Qk | Dest |
|------|-------|------|------|------------------|------------------|----|----|------|
| 2    | Add1  | Yes  | SUB  | Regs[F6]         | Mem[45+regs[R3]] |    | #2 | #4   |
| 0    | Add2  | No   |      |                  |                  |    |    |      |
| 0    | Add3  | No   |      |                  |                  |    |    |      |
| 10   | Mult1 | Yes  | Mult | Mem[45+Regs[R3]] | Regs[F4]         |    |    | #3   |
| 0    | Mult2 | No   |      |                  |                  |    |    |      |

Reservation Station

Head

LD F6, 34(R2) LD F2, 45(R3)<sup>ail</sup> MULT F0, F2, F4 SUBD F8, F6, F2 DIVD F10, F0, F6 ADDD F6, F8, F2

| Entry | Busy | Instruction     | State  | Dest. | Value      |
|-------|------|-----------------|--------|-------|------------|
| 1     | Yes  | LD F6, 34 (R2)  | Commit | F6    | Mem[load1] |
| 2     | Yes  | LD F2, 45 (R3)  | Write  | F2    | Mem[load2] |
| 3     | Yes  | MULT F0, F2, F4 | Issue  | F0    |            |
| 4     | Yes  | SUBD F8, F6, F2 | Issue  | F8    |            |
| 5     |      |                 |        |       |            |
| 6     |      |                 |        |       |            |
| 7     |      |                 |        |       |            |
| 8     |      |                 |        |       |            |
| 9     |      |                 |        |       |            |
| 10    |      |                 |        |       |            |

Load1 Load2 Load3

Busy No

No

Address

Reorder Buffer

Cycle

| 4 | Reorder# |
|---|----------|
|   | Busy     |

| _ | F0  | F2  | F4 | F6 | F8  | F10 | F12 | ••••• | F30 |
|---|-----|-----|----|----|-----|-----|-----|-------|-----|
| ‡ | #3  | #2  |    |    | #4  |     |     |       |     |
|   | Yes | Yes | No | No | Yes | No  | No  |       | No  |



| Time 1 0 | Name<br>Add1<br>Add2 | Busy<br>Yes<br>No | Op<br>SUB | Vj<br>Regs[F6]   | Vk<br>Mem[45+regs[R3]] | Qj | Qk | Dest<br>#4 |
|----------|----------------------|-------------------|-----------|------------------|------------------------|----|----|------------|
| 0<br>9   | Add3<br>Mult1        | No<br>Yes         | Mult      | Mem[45+Regs[R3]] | Regs[F4]               |    |    | #3         |
| 0        | Mult2                | Yes               | DIV       |                  | Regs[F6]               | #3 |    | #5         |

LD F6, 34(R2) LD F2, 45(R3) Head MULT F0, F2, F4 SUBD F8, F6, F2<sub>Tail</sub> DIVD F10, F0, F6 ADDD F6, F8, F2

| Entry | Busy | Instruction      | State  | Dest. | Value      |
|-------|------|------------------|--------|-------|------------|
| 1     | Yes  | LD F6, 34 (R2)   | Commit | F6    | Mem[load1] |
| 2     | Yes  | LD F2, 45 (R3)   | Commit | F2    | Mem[1oad2] |
| 3     | Yes  | MULT F0, F2, F4  | Ex1    | F0    |            |
| 4     | Yes  | SUBD F8, F6, F2  | Ex1    | F8    |            |
| 5     | Yes  | DIVD F10, F0, F6 | Issue  | F10   |            |
| 6     |      |                  |        |       |            |
| 7     |      |                  |        |       |            |
| 8     |      |                  |        |       |            |
| 9     |      |                  |        |       |            |
| 10    |      |                  |        |       |            |

|       | Busy | Address |
|-------|------|---------|
| Load1 | No   |         |
| Load2 | No   |         |
| Load3 |      |         |

Reservation Station

Reorder Buffer

Cyc1e

5 Reorder; Busy

| _  | F0  | F2 | F4 | F6 | F8  | F10 | F12 | ••••• | F30 |
|----|-----|----|----|----|-----|-----|-----|-------|-----|
| r# | #3  |    |    |    | #4  | #5  |     |       |     |
| r  | Yes | No | No | No | Yes | Yes | No  |       | No  |

LD: 1 cycles

MULT: 10 cycles

SUBD/ADDD: 2cycles

DIVD: 40 cycles



| ice and 1      |                      |                    |                  |                  |                                    |          |            |                  |         |                    |
|----------------|----------------------|--------------------|------------------|------------------|------------------------------------|----------|------------|------------------|---------|--------------------|
| LD: 1          | D: 1 cycles          |                    | ML               | ILT: 10 cycles   | SUBD/A                             | DDD:     | 2cycles    | DIVD:            | 40 cy   | ycles              |
| Time<br>O<br>O | Name<br>Add1<br>Add2 | Busy<br>Yes<br>Yes | Op<br>SUB<br>ADD | Vj<br>Regs[F6]   | Vk<br>Mem[45+regs[R3]]<br>Regs[F2] | Qj<br>#4 | Qk         | Dest<br>#4<br>#6 |         | ervation<br>cation |
| 0              | Add3                 | No                 | 1100             |                  | 11080 [1 <b>2</b> ]                |          |            |                  |         |                    |
| 8              | Mult1                | Yes                | MULT             | Mem[45+Regs[R3]] | Regs[F4]                           |          |            | #3               |         |                    |
| 0              | Mult2                | Yes                | DIV              |                  | Regs[F6]                           | #3       |            | #5               |         |                    |
|                |                      |                    |                  |                  |                                    |          |            |                  | Busy    | Address            |
|                |                      | Entry              | Busy             | Instruction      | State                              | Dest.    | Value      | Load1            | No      |                    |
|                |                      | 1                  | Yes              | LD F6, 34 (R2)   | Commit                             | F6       | Mem[load1] | Load2            | No      |                    |
|                |                      | 2                  | Yes              | LD F2, 45 (R3)   | Commit                             | F2       | Mem[load2] | Load3            |         |                    |
|                | Head                 | 3                  | Yes              | MULT F0, F2, F4  | Ex2                                | F0       |            |                  |         |                    |
|                |                      | 4                  | Yes              | SUBD F8, F6, F2  | Ex2                                | F8       |            |                  |         |                    |
|                |                      | 5                  | Yes              | DIVD F10, F0, F6 | Issue                              | F10      |            |                  |         |                    |
|                | Tail                 | 6                  | Yes              | ADDD F6, F8, F2  | Issue                              | F6       |            |                  |         |                    |
|                |                      | 7                  |                  |                  |                                    |          |            | Reo              | rder Bu | ffer               |
|                |                      | 8                  |                  |                  |                                    |          |            |                  |         |                    |
|                |                      | 9                  |                  |                  |                                    |          |            |                  |         |                    |
|                |                      | 10                 |                  |                  |                                    |          |            |                  |         |                    |
| Cycle          |                      | ΕO                 | F2               | E.4              | E.C                                | ΕO       | E10        | E10              | ••••    | F30                |
| 6              | Reorder#             | F0<br>#3           | ΓΖ               | F4               | F6<br>#6                           | F8<br>#4 | F10<br>#5  | F12              | •••••   | 061                |
| U              | Busy                 | Yes                | No               | No               | Yes                                | Yes      | Yes        | No               |         | No                 |
|                | Dasy                 | 105                | 110              | 110              | 105                                | 105      | 105        | 110              |         | 110                |



| LD: 1 | 1 cycles |       |      | ULT: 10 cycles SUBD/ADDD: 2cycles |          |       | DIVD: 40 cycles |       |          |         |
|-------|----------|-------|------|-----------------------------------|----------|-------|-----------------|-------|----------|---------|
| Time  | Name     | Busy  | Ор   | V.j                               | Vk       | Qj    | Qk              | Dest  |          |         |
| 0     | Add1     | No    | Ор   | ٧٦                                | V IX     | ٧٦    | ψĸ              | Dest  | Reser    | rvation |
| 2     | Add2     | Yes   | ADD  | #4                                | Regs[F2] |       |                 | #6    | Sta      | ntion   |
| 0     | Add3     | No    |      |                                   |          |       |                 |       |          |         |
| 7     | Mult1    | Yes   | MULT | Mem[45+Regs[R3]]                  | Regs[F4] |       |                 | #3    |          |         |
| 0     | Mult2    | Yes   | DIV  |                                   | Regs[F6] | #3    |                 | #5    |          |         |
|       |          |       |      |                                   |          |       |                 |       | <b>_</b> |         |
|       |          |       |      |                                   |          |       |                 |       |          | Address |
|       |          | Entry | Busy | Instruction                       | State    | Dest. | Value           | Load1 | No       |         |
|       |          | 1     | Yes  | LD F6, 34 (R2)                    | Commit   | F6    | Mem[load1]      | Load2 | No       |         |
|       |          | 2     | Yes  | LD F2, 45 (R3)                    | Commit   | F2    | Mem[load2]      | Load3 |          |         |
|       | Head     | 3     | Yes  | MULT F0, F2, F4                   | Ex3      | F0    |                 |       |          |         |
|       |          | 4     | Yes  | SUBD F8, F6, F2                   | Write    | F8    | F6-#2           |       |          |         |
|       |          | 5     | Yes  | DIVD F10, F0, F6                  | Issue    | F10   |                 |       |          |         |
|       | Tail     | 6     | Yes  | ADDD F6, F8, F2                   | Issue    | F6    |                 |       |          |         |
|       |          | 7     |      |                                   |          |       |                 | Rec   | rder Buf | fer     |
|       |          | 8     |      |                                   |          |       |                 |       |          |         |
|       |          | 9     |      |                                   |          |       |                 |       |          |         |
|       |          | 10    |      |                                   |          |       |                 |       |          |         |
| Cycle | l        |       |      |                                   |          |       |                 |       |          |         |
|       |          | F0    | F2   | F4                                | F6       | F8    | F10             | F12   | •••••    | F30     |
| 7     | Reorder# | #3    |      |                                   | #6       | #4    | #5              |       |          |         |
|       | Busy     | Yes   | No   | No                                | Yes      | Yes   | Yes             | No    |          | No      |



| LD: 10      | cycles               | MULT: 10 cycles   |                              |                                                                                                            | SUBD/                                   | SUBD/ADDD: 2cycles       |                                   |                         | DIVD: 40 cycles |                |  |
|-------------|----------------------|-------------------|------------------------------|------------------------------------------------------------------------------------------------------------|-----------------------------------------|--------------------------|-----------------------------------|-------------------------|-----------------|----------------|--|
| Time        | Name                 | Busy              | 0p                           | Vj                                                                                                         | Vk                                      | Qj                       | Qk                                | Dest                    |                 |                |  |
| 0<br>1<br>0 | Add1<br>Add2<br>Add3 | No<br>Yes<br>No   | ADD                          | #4                                                                                                         | Regs[F2]                                |                          |                                   | #6                      |                 | vation<br>tion |  |
| 6           | Mult1                | Yes               | MULT                         | Mem[45+Regs[R3]]                                                                                           | Regs[F4]                                |                          |                                   | #3                      |                 |                |  |
| 0           | Mult2                | Yes               | DIV                          |                                                                                                            | Regs[F6]                                | #3                       |                                   | #5                      |                 |                |  |
|             | Head<br>Tail         | Entry 1 2 3 4 5 6 | Busy Yes Yes Yes Yes Yes Yes | Instruction LD F6, 34 (R2) LD F2, 45 (R3) MULT F0, F2, F4 SUBD F8, F6, F2 DIVD F10, F0, F6 ADDD F6, F8, F2 | State Commit Commit Ex4 Write Issue Ex1 | Dest. F6 F2 F0 F8 F10 F6 | Value Mem[load1] Mem[load2] F6-#2 | Load1<br>Load2<br>Load3 | No<br>No        | Address        |  |
|             |                      | 7<br>8<br>9<br>10 |                              |                                                                                                            |                                         |                          |                                   | Reo                     | rder Buf        | fer            |  |
| Cycle       | •                    | EO                | EO                           | E4                                                                                                         | DC.                                     | EO                       | E10                               | E10                     |                 | E30            |  |
| 8           | Reorder#             | F0<br>#3          | F2                           | F4                                                                                                         | F6<br>#6                                | F8<br>#4                 | F10<br>#5                         | F12                     | •••••           | F30            |  |
| -           | Busy                 | Yes               | No                           | No                                                                                                         | Yes                                     | Yes                      | Yes                               | No                      |                 | No             |  |



Busy

Yes

No

# Tomasulo With Reorder Buffer-Cycle 9

| LD: 1 cycles |          | ML            | ILT: 10 cycles | SUBD/            | ADDD:    | 2cycles  | DIVD:      | 40 cycles |              |
|--------------|----------|---------------|----------------|------------------|----------|----------|------------|-----------|--------------|
| Time         | Name     | Busy          | 0p             | Vj               | Vk       | Qj       | Qk         | Dest      |              |
| 0            | Add1     | No            |                |                  | - 5-07   |          |            |           | Reservation  |
| 0            | Add2     | Yes           | ADD            | #4               | Regs[F2] |          |            | #6        | Station      |
| 0            | Add3     | No            |                |                  |          |          |            |           |              |
| 5            | Mult1    | Yes           | MULT           | Mem[45+Regs[R3]] | Regs[F4] |          |            | #3        |              |
| 0            | Mult2    | Yes           | DIV            |                  | Regs[F6] | #3       |            | #5        |              |
|              | •        |               |                |                  |          |          |            |           | _            |
|              |          |               |                |                  |          |          |            |           | Busy Address |
|              | 1        | Entry         | Busy           | Instruction      | State    | Dest.    | Value      | Load1     | No           |
|              |          | $\frac{1}{2}$ | Yes            | LD F6, 34 (R2)   | Commit   | F6<br>F2 | Mem[load1] | Load2     | No           |
|              | TT 1     |               | Yes            | LD F2, 45 (R3)   | Commit   |          | Mem[load2] | Load3     |              |
|              | Head     | 3             | Yes            | MULT FO, F2, F4  | Ex5      | F0       |            |           |              |
|              |          | 4             | Yes            | SUBD F8, F6, F2  | Write    | F8       | F6-#2      |           |              |
|              |          | 5             | Yes            | DIVD F10, F0, F6 | Issue    | F10      |            |           |              |
|              | Tail     | 6             | Yes            | ADDD F6, F8, F2  | Ex2      | F6       |            |           |              |
|              |          | 7             |                |                  |          |          |            | Reo       | rder Buffer  |
|              |          | 8             |                |                  |          |          |            |           |              |
|              |          | 9             |                |                  |          |          |            |           |              |
| 0 1          |          | 10            |                |                  |          |          |            |           |              |
| Cycle        |          | F0            | F2             | F4               | F6       | F8       | F10        | F12       | ••••• F30    |
| 9            | Reorder# | #3            | 1 4            | 177              | #6       | #4       | #5         | 1 12      | 1 00         |

5/8/2023 xhzhou@USTC 35

Yes

Yes

Yes

No

No

No



| ovoloo     |
|------------|
| cycles     |
|            |
| eservation |
| Station    |
|            |
|            |
|            |
| A 1 1      |
| Address    |
|            |
|            |
|            |
|            |
|            |
|            |
| Buffer     |
|            |
|            |
|            |
| •• F30     |
| 100        |
| No         |
| 5          |



| LD: 1 | cycles   |       | MU          | ILT: 10 cycles              | SUBD/           | ADDD:       | 2cycles             | DIVD:          | : 40 cy    | cles    |
|-------|----------|-------|-------------|-----------------------------|-----------------|-------------|---------------------|----------------|------------|---------|
| Time  | Name     | Busy  | 0p          | Vj                          | Vk              | Qj          | Qk                  | Dest           |            |         |
| 0     | Add1     | No    |             |                             |                 |             |                     |                |            | rvation |
| 0     | Add2     | No    |             |                             |                 |             |                     |                | St         | ation   |
| 0     | Add3     | No    |             |                             |                 |             |                     |                |            |         |
| 3     | Mult1    | Yes   | MULT        | Mem[45+Regs[R3]]            | Regs[F4]        |             |                     | #3             |            |         |
| 0     | Mult2    | Yes   | DIV         |                             | Regs[F6]        | #3          |                     | #5             |            |         |
|       |          |       |             |                             |                 |             |                     |                | D.         |         |
|       |          | E     | Dugas       | Instruction                 | Ctata           | Doot        | Volue               | I a a d 1      | Busy<br>No | Address |
|       |          | Entry | Busy<br>Yes | Instruction<br>LD F6,34(R2) | State<br>Commit | Dest.<br>F6 | Value<br>Mem[load1] | Load1<br>Load2 | No<br>No   |         |
|       |          | 2     | Yes         | LD F2, 45 (R3)              | Commit          | F2          | Mem[load1]          | Load2<br>Load3 | NO         |         |
|       | Head     | 3     | Yes         | MULT F0, F2, F4             | Ex7             | F0          | mem[read2]          | Loado          |            |         |
|       |          | 4     | Yes         | SUBD F8, F6, F2             | Write           | F8          | F6-#2               |                |            |         |
|       |          | 5     | Yes         | DIVD F10, F0, F6            | Issue           | F10         |                     |                |            |         |
|       | Tail     | 6     | Yes         | ADDD F6, F8, F2             | Write           | F6          | #4+F2               |                |            |         |
|       |          | 7     |             |                             |                 |             |                     | Rec            | rder Bu    | ffer    |
|       |          | 8     |             |                             |                 |             |                     |                |            |         |
|       |          | 9     |             |                             |                 |             |                     |                |            |         |
|       |          | 10    |             |                             |                 |             |                     |                |            |         |
| Cycle |          |       |             |                             |                 |             |                     |                |            |         |
|       |          | F0    | F2          | F4                          | F6              | F8          | F10                 | F12            | •••••      | F30     |
| 11    | Reorder# | #3    | N.T.        | NT                          | #6              | #4          | <b>#</b> 5          | N              |            | NT.     |
|       | Busy     | Yes   | No          | No                          | Yes             | Yes         | Yes                 | No             |            | No      |



| LD: 1          | cycles                       |                        | ML         | JLT: 10 cycles                   | SUBD/            | ADDD:    | 2cycles                  | DIVD:          | 40 cy      | cles             |
|----------------|------------------------------|------------------------|------------|----------------------------------|------------------|----------|--------------------------|----------------|------------|------------------|
| Time<br>0<br>0 | Name<br>Add1<br>Add2<br>Add3 | Busy<br>No<br>No<br>No | Ор         | Vj                               | Vk               | Qj       | Qk                       | Dest           |            | rvation<br>ation |
| 2              | Mult1                        | Yes                    | MULT       | Mem[45+Regs[R3]]                 | Regs[F4]         |          |                          | #3             |            |                  |
| 0              | Mult2                        | Yes                    | DIV        |                                  | Regs[F6]         | #3       |                          | #5             |            |                  |
|                |                              | Fr. + 2022             | Dugar      | Inatouation                      | Stata            | Dogt     | Volue                    | Loodi          | Busy<br>No | Address          |
|                |                              | Entry                  | Busy       | Instruction                      | State            | Dest.    | Value                    | Load1          |            |                  |
|                |                              | 2                      | Yes<br>Yes | LD F6, 34 (R2)<br>LD F2, 45 (R3) | Commit<br>Commit | F6<br>F2 | Mem[load1]<br>Mem[load2] | Load2<br>Load3 | No         |                  |
|                | Head                         | 3                      | Yes        | MULT F0, F2, F4                  | Ex8              | F0       | mem[10aa2]               | Loado          |            |                  |
|                |                              | 4                      | Yes        | SUBD F8, F6, F2                  | Write            | F8       | F6-#2                    |                |            |                  |
|                |                              | 5                      | Yes        | DIVD F10, F0, F6                 | Issue            | F10      |                          |                |            |                  |
|                | Tail                         | 6                      | Yes        | ADDD F6, F8, F2                  | Write            | F6       | #4+F2                    |                |            |                  |
|                |                              | 7                      |            |                                  |                  |          |                          | Reo            | rder Bu    | ffer             |
|                |                              | 8                      |            |                                  |                  |          |                          |                |            |                  |
|                |                              | 9                      |            |                                  |                  |          |                          |                |            |                  |
|                |                              | 10                     |            |                                  |                  |          |                          |                |            |                  |
| Cycle          |                              |                        |            |                                  |                  | _        |                          |                |            |                  |
| 1.0            | D 1 ''                       | F0                     | F2         | F4                               | F6               | F8       | F10                      | F12            | •••••      | F30              |
| 12             | Reorder#                     | #3                     | <b>N</b> T | NT                               | #6               | #4       | #5                       | N.T.           |            | N.T.             |
|                | Busy                         | Yes                    | No         | No                               | Yes              | Yes      | Yes                      | No             |            | No               |



| LD: 1          | cycles                       |                        | MU          | ILT: 10 cycles                | SUBD/           | ADDD:    | 2cycles          | DIVD:          | 40 cy            | rcles            |
|----------------|------------------------------|------------------------|-------------|-------------------------------|-----------------|----------|------------------|----------------|------------------|------------------|
| Time<br>0<br>0 | Name<br>Add1<br>Add2<br>Add3 | Busy<br>No<br>No<br>No | 0p          | Vj                            | Vk              | Qj       | Qk               | Dest           |                  | rvation<br>ation |
| 1              | Mult1                        | Yes                    | MULT        | Mem[45+Regs[R3]]              | Regs[F4]        |          |                  | #3             |                  |                  |
| 0              | Mult2                        | Yes                    | DIV         |                               | Regs[F6]        | #3       |                  | #5             |                  |                  |
|                |                              | Entry<br>1             | Busy<br>Yes | Instruction LD F6, 34 (R2)    | State<br>Commit | Dest.    | Value Mem[load1] | Load1<br>Load2 | Busy<br>No<br>No | Address          |
|                | Head                         | 2 3                    | Yes<br>Yes  | LD F2, 45(R3) MULT F0, F2, F4 | Commit<br>Ex9   | F2<br>F0 | Mem[1oad2]       | Load3          |                  |                  |
|                |                              | 4                      | Yes         | SUBD F8, F6, F2               | Write           | F8       | F6-#2            |                |                  |                  |
|                |                              | 5                      | Yes         | DIVD F10, F0, F6              | Issue           | F10      |                  |                |                  |                  |
|                | Tail                         | 6<br>7<br>8<br>9<br>10 | Yes         | ADDD F6, F8, F2               | Write           | F6       | #4+F2            | Reo            | rder But         | ffer             |
| Cycle          | ·                            | EΟ                     | Eo          | DΛ                            | D.C.            | EO       | E10              | E10            | ••••             | Eav              |
| 13             | Reorder#                     | F0<br>#3               | F2          | F4                            | F6<br>#6        | F8<br>#4 | F10<br>#5        | F12            | •••••            | F30              |
|                | Busy                         | Yes                    | No          | No                            | Yes             | Yes      | Yes              | No             |                  | No               |



| cycles   |                                        | MU                                                                                                                                                                                                                                                                                         | JLT: 10 cycles                                                                     | SUBD/                                                                                                                                                                                                                                                                                                                                                                                                                      | ADDD:                                                                                                                                                             | 2cycles                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | DIVD:                                                                      | 40 cy                                                                   | cles      |
|----------|----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------|-------------------------------------------------------------------------|-----------|
| Name     | Busy                                   | 0p                                                                                                                                                                                                                                                                                         | Vj                                                                                 | Vk                                                                                                                                                                                                                                                                                                                                                                                                                         | Qj                                                                                                                                                                | Qk                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Dest                                                                       |                                                                         |           |
|          |                                        |                                                                                                                                                                                                                                                                                            |                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                            |                                                                         | rvation   |
|          |                                        |                                                                                                                                                                                                                                                                                            |                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                            | St                                                                      | ation     |
| Add3     | No                                     |                                                                                                                                                                                                                                                                                            |                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                            |                                                                         |           |
| Mult1    | Yes                                    | MULT                                                                                                                                                                                                                                                                                       | Mem[45+Regs[R3]]                                                                   | Regs[F4]                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | #3                                                                         |                                                                         |           |
| Mult2    | Yes                                    | DIV                                                                                                                                                                                                                                                                                        |                                                                                    | Regs[F6]                                                                                                                                                                                                                                                                                                                                                                                                                   | #3                                                                                                                                                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | #5                                                                         |                                                                         |           |
|          |                                        |                                                                                                                                                                                                                                                                                            |                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                            | D                                                                       |           |
|          | E                                      | Dugar                                                                                                                                                                                                                                                                                      | Instruction                                                                        | Ctata                                                                                                                                                                                                                                                                                                                                                                                                                      | Dogt                                                                                                                                                              | Volue                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | I a a <b>d</b> 1                                                           |                                                                         | Address   |
|          |                                        |                                                                                                                                                                                                                                                                                            |                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                   | 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                                                                            |                                                                         |           |
|          |                                        |                                                                                                                                                                                                                                                                                            |                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                            | NO                                                                      |           |
| Head     | 3                                      | Yes                                                                                                                                                                                                                                                                                        | MULT F0, F2, F4                                                                    | Ex10                                                                                                                                                                                                                                                                                                                                                                                                                       | F0                                                                                                                                                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                            |                                                                         |           |
|          | 4                                      | Yes                                                                                                                                                                                                                                                                                        | SUBD F8, F6, F2                                                                    | Write                                                                                                                                                                                                                                                                                                                                                                                                                      | F8                                                                                                                                                                | F6-#2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                            |                                                                         |           |
|          | 5                                      | Yes                                                                                                                                                                                                                                                                                        | DIVD F10, F0, F6                                                                   | Issue                                                                                                                                                                                                                                                                                                                                                                                                                      | F10                                                                                                                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                            |                                                                         |           |
| Tail     | 6                                      | Yes                                                                                                                                                                                                                                                                                        | ADDD F6, F8, F2                                                                    | Write                                                                                                                                                                                                                                                                                                                                                                                                                      | F6                                                                                                                                                                | #4+F2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                            |                                                                         |           |
|          | 7                                      |                                                                                                                                                                                                                                                                                            |                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | Rec                                                                        | rder Bu                                                                 | ffer      |
|          | 8                                      |                                                                                                                                                                                                                                                                                            |                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                            |                                                                         |           |
|          |                                        |                                                                                                                                                                                                                                                                                            |                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                            |                                                                         |           |
|          | 10                                     |                                                                                                                                                                                                                                                                                            |                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                            |                                                                         |           |
|          | F0                                     | F2                                                                                                                                                                                                                                                                                         | F4                                                                                 | F6                                                                                                                                                                                                                                                                                                                                                                                                                         | F8                                                                                                                                                                | F10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | F12                                                                        | •••••                                                                   | F30       |
| Reorder# |                                        |                                                                                                                                                                                                                                                                                            | 1 1                                                                                | #6                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                                                                                                                   | #5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 1 12                                                                       |                                                                         | 100       |
| Busy     | Yes                                    | No                                                                                                                                                                                                                                                                                         | No                                                                                 | Yes                                                                                                                                                                                                                                                                                                                                                                                                                        | Yes                                                                                                                                                               | Yes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | No                                                                         |                                                                         | No        |
|          | Add1 Add2 Add3 Mult1 Mult2  Head  Tail | Name       Busy         Add1       No         Add2       No         Add3       No         Mult1       Yes         Mult2       Yes         Entry       1         2       Head       3         4       5         Tail       6         7       8         9       10         Reorder#       #3 | Name Add1       Busy No No No Add2       Op No | Name Add1 Add2 No Add3 No         Busy No Add3 No         Mult1 Yes MULT Mem[45+Regs[R3]]           Mult2 Yes DIV         Entry Busy Instruction           1 Yes LD F6, 34 (R2)         2 Yes LD F2, 45 (R3)           Head 3 Yes MULT F0, F2, F4         4 Yes SUBD F8, F6, F2           5 Yes DIVD F10, F0, F6         5 Yes ADDD F6, F8, F2           7 8         9           10         F0 F2 F4           Reorder# #3 | Name Add1 Add2 No No Add3 No No Mult1         Yes MULT Mem[45+Regs[R3]] Regs[F4] Regs[F6]           Mult2 Yes DIV Regs[F6]           Entry Busy Instruction State | Name Add1 Add2 Add3         Busy No No Add3         Op No Add3         Vj Vk Qj           Mult1 Yes         MULT Mem[45+Regs[R3]] Regs[F4]         Regs[F4]           Mult2 Yes         DIV Regs[F6] #3           Entry Busy Instruction State Dest.         Pest.           1 Yes LD F6, 34 (R2) Commit F6         Commit F2           2 Yes LD F2, 45 (R3) Commit F2         F0           4 Yes SUBD F8, F6, F2 Write F8         F0           5 Yes DIVD F10, F0, F6 Issue F10         F0           Tail 6 Yes ADDD F6, F8, F2 Write F6         F6           7 8 9 10         F0           F0 F2 F4 F6 F8         F6           Reorder# #3         #6 | Name Add1 Add2 Add3 No         Busy No | Name Add1 Add2 Add3         Busy No | Name Add1 |



| LD: 1          | cycles                       |                             | MU                           | LT: 10 cycles                                                                                              | SUBD/                                       | ADDD:                    | 2cycles                                            | DIVD:                   | 40 cy            | cles             |
|----------------|------------------------------|-----------------------------|------------------------------|------------------------------------------------------------------------------------------------------------|---------------------------------------------|--------------------------|----------------------------------------------------|-------------------------|------------------|------------------|
| Time<br>0<br>0 | Name<br>Add1<br>Add2<br>Add3 | Busy<br>No<br>No<br>No      | Ор                           | Vj                                                                                                         | Vk                                          | Qj                       | Qk                                                 | Dest                    |                  | rvation<br>ation |
| 0<br>40        | Mult1<br>Mult2               | No<br>Yes                   | DIV                          | #2*Regs[F4]                                                                                                | Regs[F6]                                    |                          |                                                    | #5                      |                  |                  |
|                | Head<br>Tail                 | Entry  1 2 3 4 5 6 7 8 9 10 | Busy Yes Yes Yes Yes Yes Yes | Instruction LD F6, 34 (R2) LD F2, 45 (R3) MULT F0, F2, F4 SUBD F8, F6, F2 DIVD F10, F0, F6 ADDD F6, F8, F2 | State Commit Commit Write Write Issue Write | Dest. F6 F2 F0 F8 F10 F6 | Value  Mem[load1]  Mem[load2]  #2*F4  F6-#2  #4+F2 | Load1<br>Load2<br>Load3 | Busy<br>No<br>No | Address          |
| Cycle          |                              | F0                          | F2                           | F4                                                                                                         | F6                                          | F8                       | F10                                                | F12                     | •••••            | F30              |
| 15             | Reorder#                     | #3                          |                              |                                                                                                            | #6                                          | #4                       | #5                                                 |                         |                  | 100              |
|                | Busy                         | Yes                         | No                           | No                                                                                                         | Yes                                         | Yes                      | Yes                                                | No                      |                  | No               |



| Science and Technolos    |                                       |                        |                          |                                                                                            |                                      |                       |                                             |                         |                        |
|--------------------------|---------------------------------------|------------------------|--------------------------|--------------------------------------------------------------------------------------------|--------------------------------------|-----------------------|---------------------------------------------|-------------------------|------------------------|
| LD: 10                   | cycles                                |                        | MU                       | LT: 10 cycles                                                                              | SUBD/A                               | ADDD:                 | 2cycles                                     | DIVD:                   | : 40 cycles            |
| Time<br>0<br>0<br>0<br>0 | Name<br>Add1<br>Add2<br>Add3<br>Mult1 | Busy<br>No<br>No<br>No | Ор                       | Vj                                                                                         | Vk                                   | Qj                    | Qk                                          | Dest                    | Reservation<br>Station |
| 39                       | Mult2                                 | Yes                    | DIV                      | #2*Regs[F4]                                                                                | Regs[F6]                             |                       |                                             | #5                      |                        |
|                          | Head                                  | Entry 1 2 3 4 5        | Busy Yes Yes Yes Yes Yes | Instruction LD F6, 34 (R2) LD F2, 45 (R3) MULT F0, F2, F4 SUBD F8, F6, F2 DIVD F10, F0, F6 | State Commit Commit Commit Write Ex1 | Dest. F6 F2 F0 F8 F10 | Value  Mem[load1]  Mem[load2]  #2*F4  F6-#2 | Load1<br>Load2<br>Load3 | Busy Address No No     |
|                          | Tail                                  | 6<br>7<br>8            | Yes                      | ADDD F6, F8, F2                                                                            | Write                                | F6                    | #4+F2                                       | Rec                     | order Buffer           |

Cycle

9 10

F0 F2 F4 F6 F8 F10 F12 F30 #6 #4 #5 Reorder# 16 No No No Yes Yes No No Busy Yes



| LD: 1          | cycles                       |                  | MUI | T: 10 cycles | SUBD/A   | ADDD: 2 | cycles | DIVD: | 40 cycles              |
|----------------|------------------------------|------------------|-----|--------------|----------|---------|--------|-------|------------------------|
| Time<br>0<br>0 | Name<br>Add1<br>Add2<br>Add3 | Busy<br>No<br>No | Ор  | Vj           | Vk       | Qj      | Qk     | Dest  | Reservation<br>Station |
| 0<br>0<br>38   | Mult1<br>Mult2               | No<br>No<br>Yes  | DIV | #2*Regs[F4]  | Regs[F6] |         |        | #5    |                        |

|      | Entry | Busy | Instruction      | State  | Dest. | Value      | Load1 |
|------|-------|------|------------------|--------|-------|------------|-------|
|      | 1     | Yes  | LD F6, 34 (R2)   | Commit | F6    | Mem[load1] | Load2 |
|      | 2     | Yes  | LD F2, 45 (R3)   | Commit | F2    | Mem[1oad2] | Load3 |
|      | 3     | Yes  | MULT F0, F2, F4  | Commit | F0    | #2*F4      |       |
|      | 4     | Yes  | SUBD F8, F6, F2  | Commit | F8    | F6-#2      |       |
| Head | 5     | Yes  | DIVD F10, F0, F6 | Ex2    | F10   |            |       |
| Tail | 6     | Yes  | ADDD F6, F8, F2  | Write  | F6    | #4+F2      |       |
|      | 7     |      |                  |        |       |            | Re    |
|      | 8     |      |                  |        |       |            |       |
|      | 9     |      |                  |        |       |            |       |
|      | 10    |      |                  |        |       |            |       |
|      |       |      |                  |        |       |            |       |

| Busy | Address |
|------|---------|
| No   |         |
| No   |         |
|      |         |

Reorder Buffer

Cycle

F0 F2 F4 F6 F8 F10 F12 F30 #6 #5 17 Reorder# No No Yes Yes No No Busy No No



| LD: 1 | cycles |      | MUI | LT: 10 cycles | SUBD/A   | ADDD: 2 | cycles | DIVD: | 40 cycles   |
|-------|--------|------|-----|---------------|----------|---------|--------|-------|-------------|
| Time  | Name   | Busy | 0p  | Vj            | Vk       | Qj      | Qk     | Dest  |             |
| 0     | Add1   | No   |     |               |          |         |        |       | Reservation |
| 0     | Add2   | No   |     |               |          |         |        |       | Station     |
| 0     | Add3   | No   |     |               |          |         |        |       |             |
| 0     | Mult1  | No   |     |               |          |         |        |       |             |
| 37    | Mult2  | Yes  | DIV | #2*Regs[F4]   | Regs[F6] |         |        | #5    |             |
|       |        |      |     |               |          |         |        |       |             |

|      | Entry | Busy | Instruction      | State  | Dest. | Value      |
|------|-------|------|------------------|--------|-------|------------|
|      | 1     | Yes  | LD F6, 34 (R2)   | Commit | F6    | Mem[load1] |
|      | 2     | Yes  | LD F2, 45 (R3)   | Commit | F2    | Mem[1oad2] |
|      | 3     | Yes  | MULT F0, F2, F4  | Commit | F0    | #2*F4      |
|      | 4     | Yes  | SUBD F8, F6, F2  | Commit | F8    | F6-#2      |
| Head | 5     | Yes  | DIVD F10, F0, F6 | Ex3    | F10   |            |
| Tail | 6     | Yes  | ADDD F6, F8, F2  | Write  | F6    | #4+F2      |
|      | 7     |      |                  |        |       |            |
|      | 8     |      |                  |        |       |            |
|      | 9     |      |                  |        |       |            |
|      | 10    |      |                  |        |       |            |

|       | Busy | Address |
|-------|------|---------|
| Load1 | No   |         |
| Load2 | No   |         |
| Load3 |      |         |

Reorder Buffer

Cycle

F0 F2 F4 F6 F8 F10 F12 F30 #6 #5 18 Reorder# No No Yes Yes No No Busy No No



Continue.....37 Cycles



| Time | Name  | Busy | Ор  | Vј          | Vk       | Qj | Qk | Dest |
|------|-------|------|-----|-------------|----------|----|----|------|
| 0    | Add1  | No   |     |             |          |    |    |      |
| 0    | Add2  | No   |     |             |          |    |    |      |
| 0    | Add3  | No   |     |             |          |    |    |      |
| 0    | Mult1 | No   |     |             |          |    |    |      |
| 0    | Mult2 | Yes  | DIV | #2*Regs[F4] | Regs[F6] |    |    | #5   |

Reservation Station

|   | Entry | Busy | Instruction      | State  | Dest. | Value      |
|---|-------|------|------------------|--------|-------|------------|
| ſ | 1     | Yes  | LD F6, 34 (R2)   | Commit | F6    | Mem[load1] |
|   | 2     | Yes  | LD F2, 45 (R3)   | Commit | F2    | Mem[load2] |
|   | 3     | Yes  | MULT F0, F2, F4  | Commit | F0    | #2*F4      |
|   | 4     | Yes  | SUBD F8, F6, F2  | Commit | F8    | F6-#2      |
|   | 5     | Yes  | DIVD F10, F0, F6 | Ex40   | F10   | #3/F6      |
|   | 6     | Yes  | ADDD F6, F8, F2  | Write  | F6    | #4+F2      |
|   | 7     |      |                  |        |       |            |
|   | 8     |      |                  |        |       |            |
|   | 9     |      |                  |        |       |            |
|   | 10    |      |                  |        |       |            |

Busy Address
No
No

Load1

Load2

Load3

Reorder Buffer

Cyc1e

54 Reorder‡

Head

Tai1

| _  | F0 | F2 | F4 | F6  | F8 | F10 | F12 | ••••• | F30 |
|----|----|----|----|-----|----|-----|-----|-------|-----|
| r# |    |    |    | #6  |    | #5  |     |       |     |
|    | No | No | No | Yes | No | Yes | No  |       | No  |



| Time | Name  | Busy | 0p | Vj | Vk | Qj | Qk | Dest |
|------|-------|------|----|----|----|----|----|------|
| 0    | Add1  | No   |    |    |    |    |    |      |
| 0    | Add2  | No   |    |    |    |    |    |      |
| 0    | Add3  | No   |    |    |    |    |    |      |
| 0    | Mult1 | No   |    |    |    |    |    |      |
| 0    | Mult2 | No   |    |    |    |    |    |      |

Reservation Station

| Б.,   | D    | T                | C      | ъ.,   | 17. 1      |
|-------|------|------------------|--------|-------|------------|
| Entry | Busy | Instruction      | State  | Dest. | Value      |
| 1     | Yes  | LD F6, 34 (R2)   | Commit | F6    | Mem[1oad1] |
| 2     | Yes  | LD F2, 45 (R3)   | Commit | F2    | Mem[load2] |
| 3     | Yes  | MULT F0, F2, F4  | Commit | F0    | #2*F4      |
| 4     | Yes  | SUBD F8, F6, F2  | Commit | F8    | F6-#2      |
| 5     | Yes  | DIVD F10, F0, F6 | Write  | F10   | #3/F6      |
| 6     | Yes  | ADDD F6, F8, F2  | Write  | F6    | #4+F2      |
| 7     |      |                  |        |       |            |
| 8     |      |                  |        |       |            |
| 9     |      |                  |        |       |            |
| 10    |      |                  |        |       |            |

Load1 No
Load2 No
Load3

Reorder Buffer

Cyc1e

56 Reorder‡ Busy

Head

Tai1

| _  | F0 | F2 | F4 | F6  | F8 | F10 | F12 | ••••• | F30 |
|----|----|----|----|-----|----|-----|-----|-------|-----|
| c# |    |    |    | #6  |    | #5  |     |       |     |
|    | No | No | No | Yes | No | Yes | No  |       | No  |



| Time | Name  | Busy | Ор | Vj | Vk | Qj | Qk | Dest |
|------|-------|------|----|----|----|----|----|------|
| 0    | Add1  | No   |    |    |    |    |    |      |
| 0    | Add2  | No   |    |    |    |    |    |      |
| 0    | Add3  | No   |    |    |    |    |    |      |
| 0    | Mult1 | No   |    |    |    |    |    |      |
| 0    | Mult2 | No   |    |    |    |    |    |      |

Reservation Station

| Entry | Busy | Instruction      | State  | Dest. | Value      |
|-------|------|------------------|--------|-------|------------|
| 1     | Yes  | LD F6, 34 (R2)   | Commit | F6    | Mem[load1] |
| 2     | Yes  | LD F2, 45 (R3)   | Commit | F2    | Mem[1oad2] |
| 3     | Yes  | MULT F0, F2, F4  | Commit | F0    | #2*F4      |
| 4     | Yes  | SUBD F8, F6, F2  | Commit | F8    | F6-#2      |
| 5     | Yes  | DIVD F10, F0, F6 | Commit | F10   | #3/F6      |
| 6     | Yes  | ADDD F6, F8, F2  | Write  | F6    | #4+F2      |
| 7     |      |                  |        |       |            |
| 8     |      |                  |        |       |            |
| 9     |      |                  |        |       |            |
| 10    |      |                  |        |       |            |

Address Busy No No

Load1 Load2

Load3

Reorder Buffer

No

Cycle

Head

F0 F2 F4 F6 F8 F10 F12 F30 #6 57 Reorder# No Yes No No Busy No No No



| Time | Name  | Busy | Ор | Vj | Vk | Qj | Qk | Dest |
|------|-------|------|----|----|----|----|----|------|
| 0    | Add1  | No   |    |    |    |    |    |      |
| 0    | Add2  | No   |    |    |    |    |    |      |
| 0    | Add3  | No   |    |    |    |    |    |      |
| 0    | Mult1 | No   |    |    |    |    |    |      |
| 0    | Mult2 | No   |    |    |    |    |    |      |

Reservation Station

| Entry | Busy | Instruction      | State  | Dest. | Value      |
|-------|------|------------------|--------|-------|------------|
| 1     | Yes  | LD F6, 34 (R2)   | Commit | F6    | Mem[load1] |
| 2     | Yes  | LD F2, 45 (R3)   | Commit | F2    | Mem[1oad2] |
| 3     | Yes  | MULT F0, F2, F4  | Commit | F0    | #2*F4      |
| 4     | Yes  | SUBD F8, F6, F2  | Commit | F8    | F6-#2      |
| 5     | Yes  | DIVD F10, F0, F6 | Commit | F10   | #3/F6      |
| 6     | Yes  | ADDD F6, F8, F2  | Commit | F6    | #4+F2      |
| 7     |      |                  |        |       |            |
| 8     |      |                  |        |       |            |
| 9     |      |                  |        |       |            |
| 10    |      |                  |        |       |            |

Busy Address No No

Load1

Load2

Load3

Reorder Buffer

Cyc1e

58 Reorder: Busy

Head

| _  | F0 | F2 | F4 | F6 | F8 | F10 | F12 | •••• | F30 |
|----|----|----|----|----|----|-----|-----|------|-----|
| r# | No | No | No | No | No | No  | No  |      | No  |



## Tomasulo With Reorder Buffer-Summary

| Instruction      | Issue | Exec Comp        | WriteBack | Commit |
|------------------|-------|------------------|-----------|--------|
| LD F6, 34 (R2)   | 1     | 2                | 3         | 4      |
| LD F2, 45 (R3)   | 2     | 3                | 4         | 5      |
| MULT F0, F2, F4  | 3     | $5^{\sim}14$     | 15        | 16     |
| SUBD F8, F6, F2  | 4     | 5 <sup>~</sup> 6 | 7         | 17     |
| DIVD F10, F0, F6 | 5     | $16^{\sim}55$    | 56        | 57     |
| ADDD F6, F8, F2  | 6     | 8 <sup>~</sup> 9 | 10        | 58     |

顺序发射、乱序执行、乱序完成、顺序提交



## 两种Tomasulo算法比较 (三阶段vs.四阶段)

| Loop | L.S F0, 0(R1)    |
|------|------------------|
|      | L.S F1, O(R2)    |
|      | ADD.S F2, F1, F0 |
|      | S.S F2, O(R1)    |
|      | ADDI R1,R1, #4   |
|      | ADDI R2,R2, #4   |
|      | SUBI R3,R3,#1    |
|      | BNEZ R3, Loop    |

#### · 假设:

- Load和store部件: 计算访存地址 需要 2 cycle; 对Cache访问 需要 1个cycle
- 浮点ADD执行:需要6个cycle
- Store操作内部分解为两个操作操作: S.S-A 计算访存地址; S.S-D 对Cache访问
- 其他整型类执行:需要2个cycle



# Tomsasulo算法执行示例(无预测)

|     |                 | Issue | Exe<br>Start | Exe<br>End | Cache | CDB  | 备注     |
|-----|-----------------|-------|--------------|------------|-------|------|--------|
| 11  | L.S F0, 0(R1)   | 1     | 2            | 3          | (4)   | (5)  |        |
| 12  | L.S F1, O(R2)   | 2     | 3            | 4          | (5)   | (6)  |        |
| 13  | ADD.S F2,F1,F0  | 3     | 7            | 12         |       | (13) | 等待F1   |
| 14  | S.S-A F2, O(R1) | 4     | 5            | 6          |       |      |        |
| 15  | S.S-D F2,0(R1)  | 5     | 14           | 15         | (16)  |      | 等待F2   |
| 16  | ADDI R1,R2, #4  | 6     | 7            | 8          |       | (9)  |        |
| 17  | ADDI R2, R2,#4  | 7     | 8            | 9          |       | (10) |        |
| 18  | SUBI R3, R3, #1 | 8     | 9            | 10         |       | (11) |        |
| 19  | BNEZ R3, Loop   | 9     | 12           | 13         |       | (14) | 等待R3的值 |
| 110 | L.S F0, 0(R1)   | 15    | 16           | 17         | (18)  | (19) | 等待I9   |
| 111 | L.S F1, O(R2)   | 16    | 17           | 18         | (19)  | (20) |        |
| 112 | ADD.S F2,F1,F0  | 17    | 21           | 26         |       | (27) | 等待F1   |



# Tomsasulo算法执行示例(有预测)

|     |                 | Issue | Exe<br>Start | Exe<br>End | Cache | CDB  | Commit | 备注                                                                 |
|-----|-----------------|-------|--------------|------------|-------|------|--------|--------------------------------------------------------------------|
| 11  | L.S F0, O(R1)   | 1     | 2            | 3          | 4     | (5)  | 6      |                                                                    |
| 12  | L.S F1, O(R2)   | 2     | 3            | 4          | 5     | (6)  | 7      |                                                                    |
| 13  | ADD.S F2,F1,F0  | 3     | 7            | 12         |       | (13) | 14     | 等待F1                                                               |
| 14  | S.S-A F2, O(R1) | 4     | 5            | 6          |       |      |        |                                                                    |
| 15  | S.S-D F2,0(R1)  | 5     | 14           | 15         | 16    |      | (17)   | 等待F2                                                               |
| 16  | ADDI R1,R2, #4  | 6     | 7            | 8          |       | (9)  | (18)   |                                                                    |
| 17  | ADDI R2, R2,#4  | 7     | 8            | 9          |       | (10) | (19)   |                                                                    |
| 18  | SUBI R3, R3, #1 | 8     | 9            | 10         |       | (11) | (20)   |                                                                    |
| 19  | BNEZ R3, Loop   | 9     | 14           | 15         |       | (16) | (21)   | 等待R3的值,若第12<br>拍或第13拍进入EXE段,<br>则WR阶段(CDB争用)<br>分别与I10,I11存在冲<br>突 |
| 110 | L.S F0, O(R1)   | 10    | 11           | 12         | 13    | (14) | (22)   |                                                                    |
| 111 | L.S F1, O(R2)   | 11    | 12           | 13         | 14    | (15) | (23)   |                                                                    |
| 112 | ADD.S F2,F1,F0  | 12    | 16           | 21         |       | (22) | (24)   | 等待F1                                                               |



## 5.4 推断执行

#### 支持推断执行 的Tomasulo

## 代码执行 示例

### Tomasulo 小结

- 1. 带有ROB的机器结构
- 2. 四阶段算法描述

- 1. 简单代码示例
- 2. 推断执行示例

- 1. ROB的作用
- 2. 动态内存歧义消除



## 使用ROB保持机器的精确状态

- · ROB维持了机器的精确状态,允许投机(推 测)执行
  - 直到确认无异常 然后进入提交阶段
  - 直到确定分支预测正确进入提交阶段
  - 如果有异常或预测错误
    - 刷新ROB、RS和寄存器结果状态表
- ·存储器操作使用类似的方法
  - Memory Ordering Buffer (MOB)
    - Store操作的结果先存放到MOB中,然后提交阶段按存储操作的程序序提交



## Memory Disambiguation: 处理对存储器引用的数据相关

· Question: 给定一个指令序列,store,load 这两个操作是否有关?即下列 代码是否有相关问题?

Eg: st O(R2), R5

••••

Id R6,0(R3)

- · 我们是否可以较早启动Id?
  - Store的地址可能会延迟很长时间才能得到.
  - 我们也许想在同一个周期开始这两个操作的执行。
- ・ 两种方法:
  - No Speculation: 不进行load操作,直到我们确信地址 O(R2) ≠ O(R3)
  - Speculation: 我们可以假设他们相关还是不相关 (called "dependence speculation"),
     如果推测错误通过ROB来修正
- · 参考书: Gonzalez, A., et al. (2011). "Processor Microarchitecture: An Implementation Perspective." Synthesis Lectures on Computer Architecture #12, Morgan & Claypool Publishers



## Memory Disambiguation

**TABLE 6.1:** Memory disambiguation schemes.

| NAME                            | SPECULATIVE | DESCRIPTION                                                                                                                                |
|---------------------------------|-------------|--------------------------------------------------------------------------------------------------------------------------------------------|
| Total Ordering                  | No          | All memory accesses are processed in order.                                                                                                |
| Partial Ordering                | No          | All stores are processed in order, but loads execute out of order as long as all previous stores have computed their address.              |
| Load Ordering<br>Store Ordering | No          | Execution between loads and stores is out of order, but all loads execute in order among them, and all stores execute in order among them. |
| Store Ordering                  | Yes         | Stores execute in order, but loads execute completely out of order.                                                                        |

· 非投机方式的基本原则: 当前存储器指令之前的store指令计算存储器地址后, 才能执行当前的存储器操作



## Summary-Tomasulo小结 #1/3

#### · Reservations stations: 寄存器重命名,缓冲源操作数

- 避免寄存器成为瓶颈
- 避免了Scoreboard中无法解决的 WAR, WAW hazards
- 允许硬件做循环展开
- 不限于基本块(快速解决控制相关)

#### Reorder Buffer:

- 提供了撤销指令运行的机制
- 指令以发射序存放在ROB中
- 指令顺序提交

#### · 分支预测对提高性能是非常重要的

- 推断执行: 在控制相关还没有解决情况下, 就开始执行
- 推断执行利用了ROB撤销指令执行的机制
  - 处理预测错误时,撤销推测执行的指令
- 基于BHT的分支预测技术 (预测分支方向)
- 基于BTB的分支预测技术 (预测分支目标地址)



## Summary-Tomasulo小结

#2/3

### ・贡献

- Dynamic scheduling
- Register renaming
- Load/store disambiguation
- 360/91 后 Pentium II; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264使用这种技术
- 不足之处:
  - Too many value copy operations
    - Register File →RS→ROB→Register File
  - Too many muxes/busses (CDB)
    - Values are from everywhere to everywhere else!
  - Reservation Stations mix values(data) and tags (control)
    - Slow down max clock frequency



# Summary-Tomasulo小结

#3/3

## ·存储器访问的冲突消解

- 非投机方式的冲突消解
  - Total Ordering
  - Partial Ordering
    - Load指令前的store指令已经完成了地址计算,有可能乱序执行存储器load操作
  - Load Ordering, Store Ordering
    - Load指令前的存储器访问指令已经完成了地址计算, load队头的 load操作有可能在store指令之前执行访存操作。
- 投机方式的执行
  - Store Ordering
  - 假设Load操作与之前未计算出有效地址的store操作无关。
- · 问题: 给出四种访问方式挖掘并行性的能力排序。



## Acknowledgements

- These slides contain material developed and copyright by:
  - John Kubiatowicz (UCB)
  - Krste Asanovic (UCB)
  - John Hennessy (Standford) and David Patterson (UCB)
  - Chenxi Zhang (Tongji)
  - Muhamed Mudawar (KFUPM)
- UCB material derived from course CS152, CS252, CS61C
- KFUPM material derived from course COE501、COE502