# DIC 作業 3 學號:312510017 宋彥霆

# 1. without pipeline 的版本:

首先,設計出 convolution 的 Block Diagram 如下圖,接著藉由助教給的 2×2 Example Code 改成 3×3 的版本,基本上只需要將 register 從 4 個改成 9 個便可以完成 without pipeline 的版本。



## 前模擬:



接著,通過 RTL 驗證後,使用 syn.tcl 合成電路,CLK:1000ps(slack time 為負),所以把 CLK\_period 調大,繼續往上找,最後得到結果如下圖,再從 Report 檔裡

#### 面找 到面積如下下圖。

### Clk\_period: 1500ps

```
Combinational area: 32796.368686

Buf/Inv area: 1996.410262

Noncombinational area: 6229.042520

Macro/Black Box area: 0.000000

Net Interconnect area: undefined (No wire load specified)

Total cell area: 39025.411206

Total area: undefined
```

In\_valid positive: 129750 ps
Out\_valid negative: 243000ps

### Clk\_period: 1800ps:

```
      clock clk (rise edge)
      1800.00
      1800.00

      clock network delay (ideal)
      0.00
      1800.00

      Out_OFM_reg_34_/CLK (ASYNC_DFFHx1_ASAP7_75t_R)
      0.00
      1800.00 r

      library setup time
      -18.98
      1781.02

      data required time
      1781.02

      data arrival time
      -1780.84

      slack (MET)
      0.18
```

```
Number of ports:
                                        1539
Number of nets:
                                        27503
Number of cells:
                                        24815
Number of combinational cells:
Number of sequential cells:
                                        1027
Number of macros/black boxes:
                                           0
Number of buf/inv:
Number of references:
Combinational area:
Buf/Inv area:
                                 1740.035533
Noncombinational area:
                                 6229.042520
Macro/Black Box area:
                                    0.000000
Net Interconnect area:
                            undefined (No wire load specified)
Total cell area:
                                 35360.815650
                            undefined
Total area:
```

In\_valid positive: 155700 ps Out\_valid negative: 291600ps

### Clk\_period: 2000ps:

```
data réquired time 1981.01

data required time 1981.01

data arrival time -1980.48

slack (MET) 0.53
```

```
Combinational area: 28706.736968

Buf/Inv area: 1471.996818

Noncombinational area: 6229.042520

Macro/Black Box area: 0.000000

Net Interconnect area: undefined (No wire load specified)

Total cell area: 34935.779487

Interconnect area: undefined
```

In\_valid positive: 173000ps
Out\_valid negative: 324000ps

#### 後模擬:



之後開啟 nWave 查看 operating time(即 out\_valid 拉下去減去 in\_valid 拉起來) 如下圖,從圖中可以得知 operating time=114000 ps,而我們可以從 Block Diagram 中得知從輸入到輸出完第一個的 OFM 總共會經過 (簡化) 2\*9 \*25 個乘跟加,因此 2\*9 \*25 /所花時間即為 Throughput,則可算出 Throughput 為 450/114000ps (op/s),最後 OPS = Throughput/Area。



完成 gate\_level 後,再回到合成的步驟更改 syn.tcl 檔案中的 clock\_period,並再度合成,比較 面積與時間的結果。我使用的 clock\_period 為 1500、1800、2000ps,最後結果如下表。

| Clock_period | Area(um²) | Operating_time(ps) | Throughput | Area efficiency |
|--------------|-----------|--------------------|------------|-----------------|
| (ps)         |           |                    | (OPS)      | (GOPS/mm²)      |
| 1500         | 39025.411 | 113250             | 3.97*10^9  | 101.8           |
| 1800         | 35360.82  | 135900             | 3.31*10^9  | 93.64           |
| 2000         | 34935.779 | 140600             | 3.20*10^9  | 91.61           |

從表中可以得知,隨著限制的 Clock\_period 越大,area 就可以做得越小,但也相對的 operating time 上升,導致 Throughput 下降,值得注意的是,Clock\_period 而在 1400 時,slack\_time 就是負值了,可得知該設計的最小時間限制應為 1400ps 左右。

### 2. with\_pipeline 的版本:

這我在每個乘法器後面皆街上 register 去切 pipeline, 在每個乘法器後面皆存一個 register, 並在加法器的部份, 多切了一極, 去使得頻率超過 1.25GHz。

#### Clk period: 1000ps:

```
clock clk (rise edge)
                                                              1000.00
                                                                           1000.00
clock network delay (ideal)
Multiple_reg_0__31_/CLK (ASYNC_DFFHx1_ASAP7_75t_R)
                                                                 0.00
                                                                           1000.00
                                                                 0.00
                                                                           1000.00 r
library setup time
                                                               -16.38
                                                                            983.62
data réquired time
                                                                            983.62
data required time
                                                                            983.62
data arrival time
                                                                           -983.51
slack (MET)
                                                                              0.11
```

```
Combinational area: 34513.076252
Buf/Inv area: 2804.025633
Noncombinational area: 8782.525383
Macro/Black Box area: 0.000000
Net Interconnect area: undefined (No wire load specified)

Total cell area: 43295.601635
Total area: undefined
```

In\_valid positive: 88500ps
Out valid negative: 165000ps

#### Clk\_period: 800ps:

```
clock clk (rise edge)
                                                                     800.00
                                                                                  800.00
clock network delay (ideal)
Multiple_reg_4__25_/CLK (ASYNC_DFFHx1_ASAP7_75t_R)
library setup time
                                                                       0.00
                                                                                  800.00
                                                                      0.00
                                                                                  800.00 r
                                                                     -17.93
                                                                                   782.07
data required time
                                                                                  782.07
data required time
                                                                                  782.07
data arrival time
                                                                                  -782.04
slack (MET)
                                                                                     0.03
```

```
Combinational area: 35205.451290
Buf/Inv area: 2722.144352
Noncombinational area: 8782.525383
Macro/Black Box area: 0.000000
Net Interconnect area: undefined (No wire load specified)

Total cell area: 43987.976673
Total area: undefined
```

In\_valid positive: 70800ps
Out\_valid negative: 132000ps

Clk\_period: 750ps (frequency is faster than 1.25GHz)

```
clock clk (rise edge)
                                                               750.00
clock network delay (ideal)
Multiple_reg_1__29_/CLK (ASYNC_DFFHx1_ASAP7_75t_R)
                                                                 0.00
                                                                            750.00
                                                                 0.00
                                                                            750.00 r
library setup time
                                                               -17.94
                                                                            732.06
data required time
                                                                            732.06
data required time
                                                                            732.06
data arrival time
                                                                           -732.04
slack (MET)
                                                                              0.02
```

```
Combinational area: 36251.012284

Buf/Inv area: 2941.660830

Noncombinational area: 8782.525383

Macro/Black Box area: 0.0000000

Net Interconnect area: undefined (No wire load specified)

Total cell area: 45033.537666

I Total area: undefined
```

In\_valid positive: 66375ps
Out\_valid negative: 123750ps

完成 gate\_level 後,再次回到合成的步驟更改 syn.tcl 檔案中的 clock\_period,並再度合成,比較面積與時間的結果。這裡我使用的 clock\_period 為 750、800、1000,結果如下表。

| Clock_period | Area(um²) | Operating_time(ps) | Throughput | Area efficiency         |
|--------------|-----------|--------------------|------------|-------------------------|
| (ps)         |           |                    | (OPS)      | (GOPS/mm <sup>2</sup> ) |
| 750          | 45033.54  | 57375              | 7.84*10^9  | 174.16                  |
| 800          | 43987.98  | 61200              | 7.35*10^9  | 167.16                  |
| 1000         | 43295.60  | 76500              | 5.88*10^9  | 135.86                  |

# 小結論:

比較兩者結果顯示,在 without pipeline 和 with pipeline 的版本,皆隨著限制的時間縮短,相對應的 area 就會越高、Throughput 則上升。另一方面,觀察是否有pipeline 的兩者,從電路結構上理解就是多一層 reg 所以面積直觀上會變大,而clk\_period 被統一擋了一次,使過程中訊號對齊一次,所以輸出時,clk\_period 可以壓得比較小。從比較也了解到加 pipeline 雖然會花費較多的空間,但是可以讓Clock period 更低,是個很好拿空間換時間的工具。

下圖為 in\_valid postive 到 out\_valid negative 的時間差,即為 operating time。



## 3. optimization

Plot the block diagram of the designed kernel:



#### Clk\_period: 570ps:

```
      clock clk (rise edge)
      570.00
      570.00

      clock network delay (ideal)
      0.00
      570.00

      Multiple_reg_1_31_/CLK (DFFHQNx1_ASAP7_75t_R)
      0.00
      570.00 r

      library setup time
      -7.33
      562.67

      data required time
      562.67

      data arrival time
      562.67

      slack (MET)
      0.03
```

```
Combinational area: 32064.569420
Buf/Inv area: 3868.015712
Noncombinational area: 6529.740359
Macro/Black Box area: 0.0000000
Net Interconnect area: undefined (No wire load specified)

Total cell area: 38594.309779
```

Operating time: 29925ps

Throughput = 1.50\*10^10 (OPS)
Area efficiency: 389.6 (GOPS/mm<sup>2</sup>)

## Clk\_period: 565ps:

| clock clk (rise edge)                         | 565.00 | 565.00   |
|-----------------------------------------------|--------|----------|
| clock network delay (ideal)                   | 0.00   | 565.00   |
| Multiple reg 1_55 /CLK (DFFHQNx1 ASAP7 75t R) | 0.00   | 565.00 r |
| library setup time                            | -17.43 | 547.57   |
| data required time                            |        | 547.57   |
|                                               |        |          |
| data required time                            |        | 547.57   |
| data arrival time                             |        | -547.56  |
|                                               |        |          |
| slack (MET)                                   |        | 0.01     |

Combinational area: 32331.908319
Buf/Inv area: 4048.107872
Noncombinational area: 6537.905166
Macro/Black Box area: 0.000000
Net Interconnect area: undefined (No wire load specified)

Total cell area: 38869.813485

Operating time: 29715ps

Throughput = 1.51\*10^10 (OPS) Area efficiency: 389.6 (GOPS/mm²)