根据下面代码回答问题（假设x3的初始值是x2+396）：

|  |  |  |  |
| --- | --- | --- | --- |
| Loop: | lw | x1, 0(x2) | ;load x1 from address 0+x2 |
|  | addi | x1, x1, 1 | ;x1 = x1+1 |
|  | sw | x1, 0(x2) | ; store x1 at address 0+x2 |
|  | addi | x2, x2, 4 | ; x2=x2+4 |
|  | sub | x4, x3, x2 | ;x4=x3-x2 |
|  | bnez | x4, Loop | ;branch to Loop if x4 != 0 |

1. 找出代码中所有RAW类型的数据依赖关系，按照下面格式列出：寄存器，源指令，目标指令；例如，寄存器x1，ld指令，addi指令。
2. 按照下面格式画出没有向前或旁路硬件情形下5段流水线中上述指令序列的时序图（假设同一个时钟周期内的寄存器读和写通过寄存器文件实现向前路径，即数据在写回寄存器阶段可以被正确读出）。假设分支指令在X阶段可以计算出分支地址（在M阶段前不能取下一条指令），如果访问内存需要1个周期，那么这段循环代码的执行需要多少个周期？

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
|  | 1 | 2 | 3 | 4 | 5 | 6 | ... |
| lw x1, 0(x2) | F | D | X | M | W |  |  |
| addi x1, x1, 1 |  |  |  |  |  |  |  |
| sw x1, 0(x2) |  |  |  |  |  |  |  |
| addi x2, x2, 4 |  |  |  |  |  |  |  |
| sub x4, x3, x2 |  |  |  |  |  |  |  |
| bnez x4, Loop |  |  |  |  |  |  |  |
| ld x1, 0(x2) |  |  |  |  |  |  |  |

1. 按照下面格式画出在具有完整向前或旁路硬件情形下5段流水线中上述指令序列的时序图。如果访问内存需要1个周期，那么这段循环代码的执行需要多少个周期？

|  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- |
|  | 1 | 2 | 3 | 4 | 5 | 6 | ... |
| lw x1, 0(x2) | F | D | X | M | W |  |  |
| addi x1, x1, 1 |  |  |  |  |  |  |  |
| sw x1, 0(x2) |  |  |  |  |  |  |  |
| addi x2, x2, 4 |  |  |  |  |  |  |  |
| sub x4, x3, x2 |  |  |  |  |  |  |  |
| bnez x4, Loop |  |  |  |  |  |  |  |
| ld x1, 0(x2) |  |  |  |  |  |  |  |

1. 如果允许调整指令顺序并允许再使用3个寄存器x5, x6, x7，该段代码最短可以在多少周期内完成？给出详细分析。