## 2021218152 陈嘉什



## 合肥工学大学 HEFEI UNIVERSITY OF TECHNOLOGY

基准性能为38。每条指令需一个时钟周期,再加上额外的延迟周期。 3.1

| 2 × 12/2/10 |        |            | 1+3=4   |
|-------------|--------|------------|---------|
| Loop:       | tld.   | f2.0(Rx)   | 1+4=5   |
| 10:         | fmul.d | f2.f0.f2   | 1+10=11 |
| 11:         | fdiv.d | f8.f2.f0   | 1+3=4   |
| 12:         | fld    | f 4.0(Ry)  |         |
| 13:         | fadd.d | f4. to. f4 | 1+2 = 3 |
| 14:         | fadd.d | flo. f8.f2 | 1+2=3   |
| 15:         | fsd    | f4.0 (Ry)  | 1+1=2   |
| 16:         | addi   | Rx.Rx.8    |         |
| 17:         | addi   | Ry.Ry.8    |         |
| 18:         | sub    | x20-x4. Rx |         |
| Iq:         | bnz    | x20. Loop  | 1+1=2   |
|             |        |            |         |

4+5+11+4+3+3+2+1+1+1+2=37 份支近迟槽了=1 37+1=38

```
202/2/8/52
                                                                                                                                                 陈嘉乐
3.2
                                                                                                                                                                                                                                                                  1+3
                 Loop:
                                                                                fld
                                                                                                                                                       (f2), O(RX)
                                                                  Stall due
                                                                                                                                                                         latency>
                                                                                                                          to
                                                                                                                                              LD
                                                                                                   due
                                                                 <stall
                                                                                                                             40
                                                                                                                                                 LD
                                                                                                                                                                         latency>
                                                               stall due
                                                                                                                             to
                                                                                                                                                  LD
                                                                                                                                                                           latercy >
                    10:
                                                                             fmul.d
                                                                                                                                                   (t), fo, f2K
                                                                                                                                                                                                                                                          1+4
                                                                                                     due
                                                             < stall
                                                                                                                                                                                       latency>
                                                                                                                               to
                                                                                                                                                   tmul.d
                                                             < stall
                                                                                                   due
                                                                                                                                                    fmul.d latency>
                                                                                                                                to
                                                             < stall
                                                                                                   due
                                                                                                                                                     fmul.d latency>
                                                                                                                              to
                                                            <Stall
                                                                                                   due
                                                                                                                                                     fmul.d., latency)
                  11:
                                                                fdiv.d
                                                                                                                                                                                                                                                    1+10
                  12:
                                                                       fld
                                                                                                                                                      (79 00(R4)
                                                                                                                                                                                                                                                     1+3
                                                                                                  due
                                                               < Stall
                                                                                                                              60
                                                                                                                                                                             lateray >
                                                             25tall
                                                                                                due
                                                                                                                              to
                                                                                                                                                                              latency)
                                                             1Stall
                                                                                               due
                                                                                                                                                                           latency >
latency >
latency >
latency >
latency >
latency >
                13:
                                                                  fadd.d

  (stall due  
  (stall due  

                                                                                                                                                                                                                                                      1+2
                                                                                                                         60
                                                                                                                         to
                                                                                                                                                   fdiv.d
                                                                                                                        60
                                                                                                                                                   fdiv.d
                                                                                                                       to
                                                                                                                                                    div.d
             14:
                                                                                                                                                                              latency
               15:
                                                           +sd
                                                                                                                                                                                                                                                        1+2
                                                             add i
                                                                                                                                                                                                                                                         1+1
              16:
               17:
                                                             addi
               18:
                                                             sub
                                                                                                                                                      X20, X4, RX
                                                                                                                                                       X20, Loop
branch delay
                19:
                                                             bnz
                                                                stall due to
                                                   总计:新附种周期
```