# 6/16 MEETING

## Area vs Cycle time



#### Flow

- 1. Select cycle time to synthesis.
- 2. report\_timing: critical path
- 3. report\_timing -path end: all endpoint slack time
- 4. report\_resources: check cell architecture
- 5. Collect then information and design
- 6. Gate-level simulation

# report\_timing

| Point                                   | Incr     | Path   |
|-----------------------------------------|----------|--------|
| clock clk (rise edge)                   | <br>0.00 | 0.00   |
| clock network delay (ideal)             | 0.00     | 0.00   |
| MEM_forwd_in1_3_reg/CK (DFFSRHQX1)      | 0.00 #   | 0.00 r |
| MEM_forwd_in1_3_reg/Q (DFFSRHQX1)       | 0.20     | 0.20 f |
| U1726/Y (OR2X1)                         | 0.18     | 0.38 f |
| U1738/Y (INVX3)                         | 0.12     | 0.49 r |
| U1662/Y (A022XL)                        | 0.15     | 0.64 r |
| U1770/Y (OR2X4)                         | 0.09     | 0.73 r |
| BRANCH/r_data_1[3] (Branch_Unit)        | 0.00     | 0.73 r |
| BRANCH/U18/Y (INVX1)                    | 0.03     | 0.76 f |
| BRANCH/U22/Y (NAND2X1)                  | 0.05     | 0.81 r |
| BRANCH/U11/Y (NAND2X1)                  | 0.05     | 0.85 f |
| BRANCH/U10/Y (OR4X4)                    | 0.21     | 1.06 f |
| BRANCH/U17/Y (NOR4X4)                   | 0.11     | 1.17 r |
| BRANCH/U33/Y (NAND2X2)                  | 0.05     | 1.22 f |
| BRANCH/U110/Y (XOR2X4)                  | 0.07     | 1.29 f |
| BRANCH/U111/Y (NOR2BX4)                 | 0.05     | 1.34 r |
| BRANCH/U13/Y (NAND2X2)                  | 0.03     | 1.38 f |
| BRANCH/U5/Y (NOR2X2)                    | 0.06     | 1.43 r |
| BRANCH/Branch (Branch_Unit)             | 0.00     | 1.43 r |
| U1723/Y (OR2X2)                         | 0.09     | 1.52 r |
| U1834/Y (INVX4)                         | 0.06     | 1.58 f |
| U1880/Y (A022XL)                        | 0.20     | 1.79 f |
| U1721/Y (A021X2)                        | 0.16     | 1.94 f |
| sub_246_aco/A[4] (RISCV_DW01_sub_7)     | 0.00     | 1.94 f |
| sub_246_aco/U231/Y (NOR2X2)             | 0.06     | 2.00 r |
| sub_246_aco/U280/Y (NAND2X1)            | 0.05     | 2.05 f |
| sub_246_aco/U240/Y (NOR2X2)             | 0.05     | 2.10 r |
| sub_246_aco/U254/Y (NAND2X1)            | 0.05     | 2.15 f |
| sub_246_aco/U228/Y (CLKBUFX4)           | 0.11     | 2.25 f |
| sub_246_aco/U292/Y (NOR2X1)             | 0.06     | 2.32 r |
| sub_246_aco/U84/Y (XOR2X1)              | 0.06     | 2.38 r |
| sub_246_aco/DIFF[18] (RISCV_DW01_sub_7) | 0.00     | 2.38 r |
| mem_addr_I_o_reg_20_0/D (DFFSRHQX1)     | 0.00     | 2.38 r |
| data arrival time                       |          | 2.38   |
|                                         |          |        |

| clock clk (rise edge)                   | 0.00   | 0.00   |
|-----------------------------------------|--------|--------|
| clock network delay (ideal)             | 0.00   | 0.00   |
| MEM_forwd_in1_3_reg/CK (DFFSRHQX1)      | 0.00 # | 0.00 r |
| MEM_forwd_in1_3_reg/Q (DFFSRHQX1)       | 0.32   | 0.32 f |
| U1711/Y (NOR3X2)                        | 0.20   | 0.51 r |
| U2144/Y (CLKBUFX3)                      | 0.16   | 0.68 r |
| U1888/Y (A0I22XL)                       | 0.05   | 0.73 f |
| U2444/Y (NAND2X1)                       | 0.08   | 0.81 r |
| add_216/A[0] (RISCV_DW01_add_3)         | 0.00   | 0.81 r |
| add_216/U383/Y (NAND2X1)                | 0.03   | 0.84 f |
|                                         |        |        |
| add_216/U237/Y (OAI21XL)                | 0.16   | 1.01 r |
| add_216/U273/Y (AOI21XL)                | 0.09   | 1.09 f |
| add_216/U350/Y (OAI21XL)                | 0.17   | 1.26 r |
| add_216/U265/Y (A0I21XL)                | 0.09   | 1.35 f |
| add_216/U355/Y (OAI21XL)                | 0.16   | 1.52 r |
| add_216/U267/Y (A0I21XL)                | 0.09   | 1.61 f |
| add_216/U356/Y (OAI21XL)                | 0.16   | 1.77 r |
| add_216/U266/Y (A0I21XL)                | 0.09   | 1.86 f |
| add_216/U357/Y (OAI21XL)                | 0.16   | 2.02 r |
| add_216/U268/Y (A0I21XL)                | 0.09   | 2.12 f |
| add_216/U249/Y (CLKINVX1)               | 0.05   | 2.16 r |
| add_216/U248/Y (NAND2X1)                | 0.03   | 2.19 f |
| add_216/U243/Y (NAND2X1)                | 0.06   | 2.25 r |
| add_216/U254/Y (A0I21X1)                | 0.06   | 2.31 f |
| add_216/U358/Y (OAI21XL)                | 0.15   | 2.46 r |
| add_216/U269/Y (A0I21XL)                | 0.09   | 2.55 f |
| add_216/U360/Y (OAI21XL)                | 0.18   | 2.73 r |
| add_216/U270/Y (AOI21XL)                | 0.09   | 2.83 f |
| add_216/U359/Y (OAI21XL)                | 0.16   | 2.99 r |
| add_216/U271/Y (A0I21XL)                | 0.10   | 3.09 f |
| add_216/U238/Y (OAI21X1)                | 0.12   | 3.21 r |
| add_216/U255/Y (A0I21X1)                | 0.07   | 3.28 f |
| add_216/U242/Y (CLKINVX1)               | 0.04   | 3.32 r |
| add_216/U241/Y (NAND2X1)                | 0.03   | 3.35 f |
| add_216/U240/Y (NAND2X1)                | 0.06   | 3.40 r |
| add_216/U257/Y (A0I21X1)                | 0.06   | 3.46 f |
| add_216/U247/Y (OAI21XL)                | 0.15   | 3.61 r |
| add_216/U272/Y (A0I21XL)                | 0.09   | 3.70 f |
| add_216/U388/Y (OAI21XL)                | 0.13   | 3.84 r |
| add_216/U258/Y (XNOR2XL)                | 0.12   | 3.96 f |
| add_216/SUM[31] (RISCV_DW01_add_3)      | 0.00   | 3.96 f |
| U1800/Y (A0I22X1)                       | 0.13   | 4.09 r |
| U1716/Y (OAI2BB1X2)                     | 0.04   | 4.12 f |
| sub_246_aco/A[29] (RISCV_DW01_sub_3)    | 0.00   | 4.12 f |
| sub_246_aco/U232/Y (XNOR2X1)            | 0.06   | 4.18 r |
| sub_246_aco/DIFF[29] (RISCV_DW01_sub_3) | 0.00   | 4.18 r |
| mem_addr_I_o_reg_31_0/D (DFFSRHQX1)     | 0.00   | 4.18 r |
| data arrival time                       |        | 4.18   |
| -1b11 (-11)                             |        |        |
| clock clk (rise edge)                   | 4.40   | 4.40   |
| clock network delay (ideal)             | 0.00   | 4.40   |
| clock uncertainty                       | -0.05  | 4.35   |
| mem_addr_I_o_reg_31_0/CK (DFFSRHQX1)    | 0.00   | 4.35 r |
| library setup time                      | -0.17  | 4.18   |
| data required time                      |        | 4.18   |
| data required time                      |        | 4.18   |
| data arrival time                       |        | -4.18  |
|                                         |        |        |
| slack (MET)                             |        | 0.00   |

## report\_timing -path end

| Endpoint               | Pa          | ath Delay |   | Path Required | Slack    |
|------------------------|-------------|-----------|---|---------------|----------|
| mem_addr_I_o_reg[20]/D | (DFFSRHQX1) | 2.380441  | r | 2.380697      | 0.000255 |
| mem_addr_I_o_reg[26]/D | (DFFSRHQX1) | 2.380441  | r | 2.380697      | 0.000255 |
| mem_addr_I_o_reg[24]/D | (DFFSRHQX1) | 2.380441  | r | 2.381674      | 0.001232 |
| mem_addr_I_o_reg[22]/D | (DFFSRHQX1) | 2.380441  | r | 2.381703      | 0.001262 |
| mem_addr_I_o_reg[28]/D | (DFFSRHQX1) | 2.380441  | r | 2.381757      | 0.001316 |
| mem_addr_I_o_reg[30]/D | (DFFSRHQX1) | 2.380441  | r | 2.381905      | 0.001464 |
| ALU_result_4_reg[52]/D | (DFFSRHQX1) | 2.425121  | f | 2.426663      | 0.001543 |
| mem_addr_I_o_reg[31]/D | (DFFSRHQX1) | 2.379738  | r | 2.381884      | 0.002146 |
| mem_addr_I_o_reg[29]/D | (DFFSRHQX1) | 2.379138  | r | 2.381503      | 0.002364 |
| ALU_result_4_reg[48]/D | (DFFSRHQX1) | 2.419932  | f | 2.423176      | 0.003244 |
| ALU_result_4_reg[56]/D | (DFFSRHQX1) | 2.428117  | f | 2.432056      | 0.003938 |
| ALU_result_4_reg[43]/D | (DFFSRHQX1) | 2.385627  | r | 2.391814      | 0.006186 |
| ALU_result_4_reg[30]/D | (DFFSRHQX1) | 2.411844  | f | 2.419133      | 0.007289 |
| ALU_result_4_reg[47]/D | (DFFSRHQX1) | 2.384508  | r | 2.391814      | 0.007305 |
| ALU_result_4_reg[46]/D | (DFFSRHQX1) | 2.384037  | r | 2.391814      | 0.007776 |
| ALU_result_4_reg[45]/D | (DFFSRHQX1) | 2.383818  | r | 2.391814      | 0.007995 |
| ALU_result_4_reg[32]/D | (DFFSRHQX1) | 2.423296  | f | 2.432056      | 0.008760 |
| ALU_result_4_reg[44]/D | (DFFSRHQX1) | 2.382251  | r | 2.391814      | 0.009562 |
| ALU_result_4_reg[16]/D | (DFFSRHQX1) | 2.408828  | f | 2.419216      | 0.010387 |
| ALU_result_4_reg[36]/D | (DFFSRHQX1) | 2.421116  | f | 2.432056      | 0.010940 |
| ALU_result_4_reg[42]/D | (DFFSRHQX1) | 2.420979  | f | 2.432054      | 0.011075 |
| mem_addr_I_o_reg[27]/D | (DFFSRHQX1) | 2.370159  | r | 2.381977      | 0.011817 |
| ALU_result_4_reg[14]/D | (DFFSRHQX1) | 2.406993  | f | 2.419216      | 0.012223 |
| mem_addr_I_o_reg[17]/D | (DFFSRHQX1) | 2.363915  | r | 2.376270      | 0.012355 |
| mem_addr_I_o_reg[13]/D | (DFFSRHQX1) | 2.367330  | r | 2.381813      | 0.014482 |
| mem_addr_I_o_reg[14]/D | (DFFSRHQX1) | 2.364450  | r | 2.380689      | 0.016240 |
| ALU_result_4_reg[22]/D | (DFFSRHQX1) | 2.402434  | f | 2.419133      | 0.016699 |
| mem_addr_I_o_reg[15]/D | (DFFSRHQX1) | 2.364450  | r | 2.381449      | 0.016999 |
| ALU_result_4_reg[40]/D | (DFFSRHQX1) | 2.414895  | f | 2.432056      | 0.017161 |
| mem_addr_I_o_reg[19]/D | (DFFSRHQX1) | 2.359337  | r | 2.380622      | 0.021285 |

#### Related instructions

- 1. add, sub, srl, sll...
- 2. beq, bne
- 3. load, store...

### Five stage pipeline

- 1. Instruction Fetch: IF
- 2. Instruction Decode: ID
- 3. Execute(ALU): EX
- 4. Memory: MEM
- 5. Write back to register: WB

#### Instruction flow

- 1. Load data
- 2. Occur load-use hazard: MEM forwarding
- 3. Occur data hazard: WB forwarding
- 4. Run over every instruction
- 5. Make ALU path longer
- 6. Take control of next instruction address (branch)

#### Check cell:

report\_resources

| Resource | <br>  Module<br> | <br>  Parameters  <br> | Contained  <br>Resources | <br>  Contained Operations<br> |
|----------|------------------|------------------------|--------------------------|--------------------------------|
| r444     | <br>  DW_cmp     | <br>  width=5          |                          | eq_120 eq_200_3                |
| r445     | DW_cmp           | width=5                |                          | eq_121 eq_200_4                |
| r469     | DW_cmp           | width=5                |                          | eq_118                         |
| r471     | DW_cmp           | width=5                |                          | eq_119                         |
| r473     | DW_cmp           | width=5                |                          | eq_122                         |
| r475     | DW_cmp           | width=5                |                          | eq_123                         |
| r477     | DW01_add         | width=33               |                          | add_214                        |
| r481     | DW01_add         | width=32               |                          | add_216                        |
| r485     | DW_cmp           | width=5                |                          | eq_244_3                       |
| r487     | DW_cmp           | width=5                |                          | eq_244_4                       |
| r489     | DW01_sub         | width=30               |                          | sub_246_aco                    |
| r491     | DW01_sub         | width=64               |                          | sub_294                        |
| r1213    | DW01_sub         | width=32               |                          | sub_1_root_sub_0_root_sub_     |
| r1215    | DW01_add         | width=32               |                          | add_0_root_sub_0_root_sub_     |

| Implementation I     | Report<br>                   |                                                        |                           |
|----------------------|------------------------------|--------------------------------------------------------|---------------------------|
| <br>  Cell           | <br>  Module                 | Current<br>  Implementation                            | Set  <br>  Implementation |
| add_0_root_sul       | <br>0_root_sub_217           |                                                        | l !                       |
| <br>  sub_1_root_sul | DW01_add<br>o_0_root_sub_217 | pparch (area,speed)<br>                                | ı i                       |
| <br>  sub_246_aco    | DW01_sub<br>  DW01_sub       | <pre>  pparch (area,speed)   pparch (area,speed)</pre> |                           |
| add_214              | DW01_add                     | pparch (area,speed)                                    |                           |
| sub_294<br>  add_216 | DW01_sub<br>  DW01_add       | <pre>  pparch (area,speed)   pparch (area,speed)</pre> |                           |
| ===========          |                              |                                                        |                           |

#### Check cell

- DW01\_add: pparch -> delay-optimized flexible parallel-prefix
- DW01\_sub: pparch -> delay-optimized flexible parallel-prefix







## Test

| Synthesis<br>Cycle Time | Pattern 1<br>limit | Pattern 2<br>limit | Pattern 3<br>limit | Benchmark<br>limit | Arrival<br>time |
|-------------------------|--------------------|--------------------|--------------------|--------------------|-----------------|
| 2.2 ns                  | 2.1 ns             | 2.2 ns             | 2.2 ns             | 2.2 ns             | 1.98 ns         |
| 2.6 ns                  | 2.4 ns             | 2.4 ns             | 2.2 ns             | 2.5 ns             | 2.38 ns         |
| 3.8 ns                  | 3.1 ns             | 3.1 ns             | 3 ns               | 3.6 ns             | 3.58 ns         |
| 4.4 ns                  | 3.4 ns             | 3.3 ns             | 3.3 ns             | 4.1 ns             | 4.18 ns         |

## Why cannot occur tight bound?

- 1. Some case wouldn't occur.
- -> There are some address which we won't arrive.
- -> Instruction address is 32 bits, but memory contains 256 words, which only uses 10 bits
- 2. Not exactly know the architecture of computing cell.
- 3. More details I didn't notice.