Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbalanced buffer insertion on high fanout designs #2090

Open
Dolu1990 opened this issue Feb 6, 2024 · 3 comments
Open

Unbalanced buffer insertion on high fanout designs #2090

Dolu1990 opened this issue Feb 6, 2024 · 3 comments
Labels
question The user needs help

Comments

@Dolu1990
Copy link

Dolu1990 commented Feb 6, 2024

Description

Hi,

My setup:
I had a design with a high fanout part, where i had to read a few register based memory array (~2530 muxes to drive from one address).

Symptoms :
In such case, it seems that the buffer insertion done by the flow is very unbalanced, because that critical path was using a chain of 13 buffers (typical can out of 10), while an utopian balanced fanout would be able to reach 10^13 gates.

Here is an image to ilustrate, where in pink you can see the buffer chain for the high fanout net :
image

Here is for reference the critical path :

  Delay    Time   Description
---------------------------------------------------------
   0.00    0.00   clock clk (rise edge)
   0.00    0.00   clock network delay (ideal)
   0.00    0.00 ^ EU0_ExecutionUnitBase_pipeline_execute_0_Frontend_MICRO_OP[8]_sky130_fd_sc_hd__mux2_2_A0_A1_sky130_fd_sc_hd__a221o_2_A1_X_sky130_fd_sc_hd__o21a_2_B1_X_sky130_fd_sc_hd__dfxtp_2_D/CLK (sky130_fd_sc_hd__dfxtp_4)
   0.46    0.46 v EU0_ExecutionUnitBase_pipeline_execute_0_Frontend_MICRO_OP[8]_sky130_fd_sc_hd__mux2_2_A0_A1_sky130_fd_sc_hd__a221o_2_A1_X_sky130_fd_sc_hd__o21a_2_B1_X_sky130_fd_sc_hd__dfxtp_2_D/Q (sky130_fd_sc_hd__dfxtp_4)
   0.28    0.75 v EU0_ExecutionUnitBase_pipeline_execute_0_SrcStageables_SRC2[1]_sky130_fd_sc_hd__or2_2_B/X (sky130_fd_sc_hd__or2_1)
   0.24    0.98 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[1]_sky130_fd_sc_hd__and4_2_A/X (sky130_fd_sc_hd__and4_1)
   0.21    1.19 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[1]_sky130_fd_sc_hd__a31oi_2_B1_Y_sky130_fd_sc_hd__o21bai_2_A1/Y (sky130_fd_sc_hd__o21bai_4)
   0.26    1.45 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[2]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21o_2_B1/X (sky130_fd_sc_hd__a21o_1)
   0.23    1.69 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[3]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21o_2_B1/X (sky130_fd_sc_hd__a21o_1)
   0.25    1.94 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[4]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21o_2_B1/X (sky130_fd_sc_hd__a21o_1)
   0.29    2.23 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[5]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21o_2_B1/X (sky130_fd_sc_hd__a21o_2)
   0.25    2.48 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[6]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21o_2_B1/X (sky130_fd_sc_hd__a21o_1)
   0.29    2.77 ^ EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[7]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21oi_2_B1/Y (sky130_fd_sc_hd__a21oi_4)
   0.22    2.99 ^ EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[8]_sky130_fd_sc_hd__nand2_2_A_Y_sky130_fd_sc_hd__o211a_2_C1/X (sky130_fd_sc_hd__o211a_1)
   0.24    3.23 ^ EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[10]_sky130_fd_sc_hd__nand2_2_A_Y_sky130_fd_sc_hd__o31a_2_B1/X (sky130_fd_sc_hd__o31a_2)
   0.20    3.43 ^ Lsu2Plugin_logic_sq_mem_addressPre[0][13]_sky130_fd_sc_hd__mux4_2_A0_X_sky130_fd_sc_hd__mux2_2_A0_X_sky130_fd_sc_hd__mux2_2_A1_X_sky130_fd_sc_hd__nand2_2_B_Y_sky130_fd_sc_hd__o31a_2_B1_A2_sky130_fd_sc_hd__and3_2_X_B_sky130_fd_sc_hd__a211o_2_X/X (sky130_fd_sc_hd__a211o_1)
   0.15    3.59 v Lsu2Plugin_logic_sq_mem_addressPre[0][13]_sky130_fd_sc_hd__mux4_2_A0_X_sky130_fd_sc_hd__mux2_2_A0_X_sky130_fd_sc_hd__mux2_2_A1_X_sky130_fd_sc_hd__nand2_2_B_Y_sky130_fd_sc_hd__o31a_2_B1_A3_sky130_fd_sc_hd__a21oi_2_Y/Y (sky130_fd_sc_hd__a21oi_2)
   0.29    3.88 v wire486/X (sky130_fd_sc_hd__buf_4)
   0.43    4.31 v Lsu2Plugin_logic_sq_mem_addressPre[0][13]_sky130_fd_sc_hd__mux4_2_A0_X_sky130_fd_sc_hd__mux2_2_A0_X_sky130_fd_sc_hd__mux2_2_A1_X_sky130_fd_sc_hd__nand2_2_B_Y_sky130_fd_sc_hd__o31a_2_B1/X (sky130_fd_sc_hd__o31a_2)
   0.20    4.51 v wire479/X (sky130_fd_sc_hd__buf_12)
   0.44    4.95 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[8][4]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__inv_2_Y/Y (sky130_fd_sc_hd__inv_2)
######### chain start here
   0.30    5.24 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[8][4]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_1_A_1/X (sky130_fd_sc_hd__buf_4)
   0.32    5.57 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[0][5]_sky130_fd_sc_hd__mux4_2_A0_S1_sky130_fd_sc_hd__buf_1_A/X (sky130_fd_sc_hd__buf_4)
   0.31    5.87 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[4][7]_sky130_fd_sc_hd__mux4_2_A0_S1_sky130_fd_sc_hd__buf_1_A_1/X (sky130_fd_sc_hd__clkbuf_8)
   0.32    6.19 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[0][9]_sky130_fd_sc_hd__mux4_2_A0_S1_sky130_fd_sc_hd__buf_1_A_1/X (sky130_fd_sc_hd__buf_6)
   0.36    6.55 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[4][9]_sky130_fd_sc_hd__mux4_2_A0_S1_sky130_fd_sc_hd__buf_1_A_1/X (sky130_fd_sc_hd__buf_8)
   0.37    6.92 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[0][9]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_1_X/X (sky130_fd_sc_hd__clkbuf_8)
   0.31    7.23 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[0][9]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_2_A/X (sky130_fd_sc_hd__clkbuf_8)
   0.35    7.58 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[0][16]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_1_X/X (sky130_fd_sc_hd__buf_6)
   0.33    7.91 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[0][16]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_1_A/X (sky130_fd_sc_hd__buf_4)
   0.32    8.23 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[4][17]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_2_A/X (sky130_fd_sc_hd__buf_12)
   0.45    8.68 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_3[29][18]_sky130_fd_sc_hd__or2_2_A_B_sky130_fd_sc_hd__buf_1_X/X (sky130_fd_sc_hd__clkbuf_16)
   0.39    9.07 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_3[29][18]_sky130_fd_sc_hd__or2_2_A_B_sky130_fd_sc_hd__buf_1_A_4/X (sky130_fd_sc_hd__buf_12)
   0.44    9.51 ^ Lsu2Plugin_logic_sharedPip_stages_0_ADDRESS_PRE_TRANSLATION[13]_sky130_fd_sc_hd__buf_2_X/X (sky130_fd_sc_hd__buf_12)
######### chain end here
   0.66   10.17 v Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[0][19]_sky130_fd_sc_hd__mux4_2_A0/X (sky130_fd_sc_hd__mux4_2)
   0.42   10.59 v Lsu2Plugin_logic_sharedPip_stages_0_MMU_L0_ENTRIES_0_physicalAddress[1]_sky130_fd_sc_hd__a31o_2_X_A2_sky130_fd_sc_hd__a211o_2_X/X (sky130_fd_sc_hd__a211o_1)
   0.21   10.80 v Lsu2Plugin_logic_sharedPip_stages_0_MMU_L0_ENTRIES_0_physicalAddress[1]_sky130_fd_sc_hd__a31o_2_X/X (sky130_fd_sc_hd__a31o_1)
   0.00   10.80 v Lsu2Plugin_logic_sharedPip_stages_0_MMU_L0_ENTRIES_0_physicalAddress[1]_sky130_fd_sc_hd__dfxtp_2_D/D (sky130_fd_sc_hd__dfxtp_1)
          10.80   data arrival time

So, i don't know if you had similar issues / case ?
ex : clock enable / reset tree / ...

Proposal

No response

@Dolu1990
Copy link
Author

Dolu1990 commented Feb 7, 2024

I got another case, with a very long chain of buffer on another part of the design (16 buffer), but this time the average fanout is quite different. Bellow, in pink is the buffer chain path.
image

I would say there is no good reason for so much buffers, but i don't know much. Let's me know if you have an idea :)

@mo-hosni
Copy link
Collaborator

Hi,
In the second example, most of the buffers are either for long wires or max cap. Typically, the fanout buffers are names start with fanout. Also, in the first example, all the buffer names are different. Are you sure these buffers are inserted by OpenROAD in the Design Optmizations?
Also, can you send a reproducible or the configuration and timing constraints used?
Thanks

@Dolu1990
Copy link
Author

Dolu1990 commented Feb 25, 2024

Hi,

In the second example, most of the buffers are either for long wires or max cap. Typically, the fanout buffers are names start with fanout

I lost the exact setup i used in the screen shot, sorry, but i still had the outflow. I toke a video where i show a bit the paths.
https://drive.google.com/file/d/1WWhCPqjMZksxn_hHWuh5DWztjdk9ewGj/view?usp=drive_link
Seems to me that the first part of the buffer chain is achieving very little (not traveling far nor driving much, especialy compared to other paths in the design)

Also, in the first example, all the buffer names are different. Are you sure these buffers are inserted by OpenROAD in the Design Optmizations?

In the verilog i feed openlane with, there is no handwritten buffer insertions, so those buffer come from somewere in the whole openlane flow, i don't know more.

Also, can you send a reproducible or the configuration and timing constraints used?

Here is a design which can be used to recreate similar issues to the first case :
design_nax.zip

And a video of the case 1:
case1.mkv.zip
Also, one thing to notice in that case 1, i digged a bit more to see what was connected to the buffers, and basicaly in that loooong chain of buffers, for each layer it is generaly : "~8 gates + ~2 buffers". So each layer scale the path very little.

For case 2, I do not have the original verilog file, but here is the synthethised netlist (in case of, it may just be fine to swap it in the design_nax.zip)
nax.v.zip

Thanks :)

@donn donn added the question The user needs help label Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question The user needs help
Projects
None yet
Development

No branches or pull requests

3 participants