## A NEW LAYOUT STYLE FOR HIGH-PERFORMANCE CIRCUITS

### **Fernando MORAES**

CPGCC - Universidade Federal do Rio Grande do Sul Av. Bento Gonçalves 9500 - Campus do Vale - Bloco IV CP 15064 - CEP 91501-970 Porto Alegre - BRASIL Phone: (+55) 51 336 83 99/Ext. 6828 - Fax: (+55) 51 336 55 76 e-mail: moraes@inf.ufrgs.br

### **Abstract**

The automatic layout synthesis for random logic circuits is the solution to obtain the best trade-off between area, delay and power. Classic approaches, like standard-cells, have area and power consumption penalized, since the basic cells are over-sized to supply strong output loads eventually needed. This paper presents a new layout style to generate random logic circuits, developing 3 characteristics to improve electrical performances: reduction of parasitic capacitances (minimum use of polisilicon layer and cell synthesis with minimum number of diffusion gaps), transistor sizing without changes in placement and routing (resulting in constant area for different sizing solutions) and use of 3 metal layers and stacked contacts for routing.

#### I. Introduction

The standard-cell approach is currently considered by VLSI designers as the best solution to synthesize random logic circuits. This is a correct choice, but this approach does not explore all benefits offered by such a technology:

- the standard cells are designed to supply a strong output load, resulting in a waste silicon area and more power consumption than the minimum necessary;
- the logic functions must be mapped over the standard cells, and considering the limited number of functions in a library, some functions are mapped over many standard cells, consequently increasing the circuit delay and area.

The main advantages of the automatic layout synthesis are:

- <u>Technology independence</u>. All design rules are input parameters for the generator. The same circuit can be generated for different design rules (for example 0.7 or 0.5 μm), just changing the technology parameters' description file.
- <u>No cell libraries.</u> All gates are generated on-the-fly, considering the specific needs of the circuit. In this way, the design and validation of cells as well as library maintenance can be suppressed.
- <u>Free technology mapping</u>. Possibility to use all complex gates (AOIs) combinations. The advantages using AOI gates are: area, delay and power reduction (when compared with the

- standard-cell approach). As shown in [REI95], the average reduction in the number of transistors, when using AOI gate, is 35%.
- <u>Transistor sizing possibility.</u> This facility allows to meet the user constraints, like the full-custom approach.

In order to synthesize optimized circuits, the layout synthesis tools must also to consider the following characteristics:

- a/ Reduce parasitic elements. The delay induced by parasitic capacitances and resistances tends to be larger than the transistor delay, for sub-micron technologies (transistor length smaller than 1 μm). To improve circuit performance, diffusion capacitance (side-wall and area) and polisilicon length connecting dual transistors ('N' and 'P' gates) need to be reduced. Parasitic diffusion capacitance is minimized using the linear-matrix style, which places horizontally the transistors, connecting them by abutment, resulting in a minimum value for the side-wall capacitance. Two topologies can be used to reduce polisilicon length: the standard cell topology (in the linear-matrix style the routing is done between diffusion lines, increasing polisilicon length, while in the standard-cell approach the routing is done between the rows of cells) or placement of supply lines between transistors [HWA93] [KIM92]. There are also the parasitic elements induced by the routing, common to any layout style. To reduce them, the placement must be improved by using performance-driven algorithms [SUT93] [KOI95].
- b/ Take into account the technology evolution. The new layout synthesis tools have to implement the routing with 3 or more metal layers [TER94] and use stacked contacts. If many metal layers are used, the "transparency" concept [REI88][JOH95] is natural, allowing the routing over the cells, resulting in considerably silicon area reduction. The use of stacked contacts allows the connection between non adjacent layers, like between diffusion and metal2 or between metal1 and metal3. The experience shows us that the stacked contacts are useful to connect drain/source nodes to metal2.
- c/ Estimate parasitic elements. This is an important condition to integrate the high-level synthesis (or logical synthesis) to the layout synthesis. After the logical synthesis, the parasitic elements (routing, diffusion and polisilicon) will be *evaluated* by the layout synthesis tool, returning the expected delay [MOR95]. If the delay condition is not attempted, the following approaches can be used: transistor sizing [AUV91], template selection [FAN95] (for the standard-cells approach), buffer insertion [AZE92] or execute a second logical synthesis, changing some cost functions to improve performance. The delay prediction reduces the design time, because it's not necessary to do the layout synthesis, the layout extraction and the electrical simulation to obtain circuit delay.

This paper presents in section II two different layout styles, in section III we describe the new layout style and preliminary results, in section IV the strategy to choose the initial solution for transistor widths, and finally we present our conclusions and future work.

## II. Layout synthesis approaches

The layout style used in the TRAGO synthesis tool [MOR90] is based on the gate-matrix approach. Like any other automatic layout synthesis approach there are no pre-defined cells. The cells are generated in function of the load and the routing. In this tool, the placement and routing are done before cell synthesis. The cells are generated over the routing. The generated layout is actually a symbolic description, resulting in a poor transistor density (transistors by square millimeter).

This methodology has the following characteristics:

- No routing channels, the connections are implemented over the transistors.
- The connection of drain/source nodes to supply lines are in diffusion layer, resulting in an important parasitic source.
- The first metal layer is used for horizontal routing and the second metal layer for the vertical one.
- Only one contact is used in each drain/source, also increasing the parasitic resistances.
- Variable transistor width, according to the input netlist (in Spice format). However, the
  increase of the transistor widths' results in a great area increase, because the widths off all
  symbolic lines are increased.
- Synthesize only complementary gates (inverters, nands, nors and AOIs).

This tool can be used for small circuits or for academic proposals. The figure 1 (at the end of the paper) shows a TRAGO layout.

To solve the problems of the previous approach, the TROPIC [MOR93] tool was developed. This tool uses the linear-matrix style, generating optimally each cell (minimum number of diffusion gaps). This method maintains the following characteristics: input from Spice netlist, the cells are placed in horizontal rows, no routing channels between these rows, the first metal layer is used for horizontal routing and the second one for the vertical routing.

The output of the TROPIC tool is a symbolic description (wires, contacts and transistors), which is translated into layout by a compaction tool. The figure 2 (at the end of the paper) shows a TROPIC layout. The advantage to break the layout synthesis into two steps, topology synthesis and compaction, is a higher transistor density (for example: 5500 tr/mm², average value for a 1.0 µm technology). However, a compaction tool needs much memory and a high CPU time to compact a medium size macro-cell (2,000 transistors), limiting the size of the circuits that can be generated.

The new features of this approach are:

- The supply connections are implemented with the first metal layer, with multiple contacts in each drain/source.
- The area increase for different transistor widths is smaller than the previous approach,

because only the width of the transistor lines' increases (in the previous approach the widths of all lines are increased).

- Polisilicon layer connects complementary gates (vertically). Metal 2 layer is used for vertical connections between rows, and to connect the 'N' and 'P' parts of each cell
- It is possible to synthesize complementary gates and transmission gates.

The TROPIC tool can synthesize macro-cells up to 5,000 transistors (when using the OPUS-CADENCE<sup>TM</sup> compaction tool). The area and delay estimation techniques, for this layout style, were shown in [MOR95]. The major problem of the TROPIC layout style is the increase of the length of polisilicon lines, since the routing is done between transistors. This results in more parasitic elements, reducing the electrical performances in complex circuits.

New characteristics must be added to the layout synthesis tools in order to improve electrical performances and reduce power consumption:

- minimum use of polisilicon layer,
- routing with 3 or more metal layers,
- generation of complex circuits, using simpler compaction algorithms (for example, a virtual grid instead a constrain graph, if the area increase is lower than 15%),
- change the transistors' widths **without** change in the placement/routing (with no change in the used area).

# III. Proposed layout style

Considering the modern technologies, the connections play a major role in the delay. Therefore, the circuits must have the routing evaluated *after* the physical synthesis in order to verify if the imposed delay constraints were attempted. As mentioned above, the following solutions can be applied: transistor sizing, template selection and buffer insertion, resulting in a *second* physical synthesis, increasing the design time.

The proposed layout style will permit the transistor sizing, after physical synthesis, without changes in the final area, suppressing the second physical synthesis. For this, it will use the transparency concept, implementing the routing over the transistors. We also consider 3 metal layers for routing, stacked contacts and minimum use of polisilicon layer.

The transistor placement will be done using the linear-matrix style, i.e., 2 horizontal diffusion lines. In this style, the transistors are connected, if possible, by abutment, aiming the reduction of the side-wall capacitances. The following example shows a serial connection between two transistors (1.0 $\mu$ m technology, capacitances for the diffusion layer:  $C_{AREA}=0.31$  fF/ $\mu$ m<sup>2</sup> and  $C_{side-wall}=0.45$  fF/ $\mu$ m):



For this technology, the input capacitance of a minimum inverter is 4.31 fF, the same order of the parasitic elements of the intermediate nodes. This example shows why the transistors must be connected by abutment and how important is to reduce parasitic capacitances.

In linear-matrix cells (or macro-cells) the used area is not minimal. As mentioned in [FUK95], we can have smaller cells, with no constraints in transistor placement. However, this author does not consider parasitic effects, only area optimization. The result is something like gate-matrix, with many diffusion breaks and poor electrical performances.

The polisilicon layer is only used to connect dual gates ('N' and 'P' pairs). It's not allowed to implement wires with this layer (RC<sub>poli</sub> 4\*RC <sub>metall</sub>). The figure 3 illustrates two topologies to connect dual gates, minimizing the polisilicon length: external supply lines, like the standard-cells approach, or supply lines between transistors. The first approach implies the same topological restrictions of the standard-cells (constant row height, fixed by the higher cell), restraining the transistor sizes. In the second approach, supply lines between transistors, the cells can have different heights (they are abutted by the supply lines) and there is no restriction related to the transistors' widths.



(a) External supply lines

(b) Supply lines between transistors

Figure 3 - Polisilicon use and placement of supply lines

For the technology ECPD10 (1.0  $\mu$ m), the delay for a polisilicon line with 60 $\mu$ m is equal to 50% of the minimum inverter delay. This explains why the polisilicon lines must be reduced.

The next item to analyze is the connection of the polisilicon lines to the first metal layer. Again, two topologies can be considered: external and internal connection (figure 4). The first topology, external connection, also restrains the transistor sizes, because the placement of the contact over the polisilicon *fix* the transistor widths. The second approach, internal connection, permits the increasing (or reduction) of the transistors' widths *independently* of the routing. This characteristic permits to change the transistors' widths, *after* placement and routing, with no changes in the final area.



Figure 4 - Connection to polisilicon lines

The supply lines (figure 5a) are implemented with the second metal layer, between the diffusion lines. A vertical wire, using the first metal layer, connects these lines to the drain/source nodes. The output drain/source nodes (figure 5b) will be connected to a horizontal line between supply wires, using also the second metal layer. Many contacts are placed in each drain/source (supply and output), to reduce parasitic elements.



Figure 5 - Drain/source nodes

The routing will be done with the following layers:

- horizontal: second metal layer,

- vertical: first and third metal layer.

The cell model, using this layout style, is presented in figure 6. Observe that the inputs and the outputs are accessible for routing, in metal1, between diffusion lines. The output node can be accessed by more than one point, giving more flexibility for the pin assignment algorithm. The cell is fully transparent for the third metal layer. Excluding the 3 horizontal lines in metal 2 (vcc, gnd and output), the rest of the cell is transparent to the second metal layer, allowing the routing over the transistors (considering the possibility to realize stacked contacts over the drain/source nodes).

Figure 6 - Cell model

Figure 7 (at the end of the paper) shows a layout of the proposed style. Table 1 presents a comparative study between the presented layout styles.

| ADDER CIRCUIT: 28 transistors |             |                       |           |           |               |      |
|-------------------------------|-------------|-----------------------|-----------|-----------|---------------|------|
| STYLE                         | area        | density               | rise time | fall time | average power | Cpar |
|                               | $(\mu m^2)$ | (Tr/mm <sup>2</sup> ) | (ns)      | (ns)      | (mW)          | (fF) |
| Proposed                      | 4462        | 6275                  | 2.40      | 1.40      | 26.0          | 250  |
| TROPIC                        | 4042        | 6927                  | 2.49      | 1.49      | 23.7          | 208  |
| TRAGO                         | 9576        | 2924                  | 3.06      | 2.11      | 31.0          | 305  |

Technology: ECPD10 - 1.0  $\mu$ m / Cload = 150fF / Cpar = sum of the parasitic capacitances

Distribution of the parasitic capacitances Cpar metal1 Cpar TOTAL **STYLE** Cpar diffusion and poli (fF) and metal2 (fF) (fF) Proposed 113 138 250 TROPIC 72 136 208 TRAGO 92 213

Table 1 - Comparative study between the presented layout styles

The generated layout is not yet optimized. The main deficiencies are:

- there is no post-processing in the routing phase, resulting in a lot of redundant vias;
- the compaction tool does not place the routing over the transistors.

Apart from that, the delay of the proposed style is almost the same of the TROPIC approach, and the most important, the parasitic contribution of diffusion and polisilicon layer is lower than the other approaches. After implementation of the optimization algorithm (routing), the metal1/metal2 capacitances will be lower, improving the electrical performances of the circuits.

The topology of a macro-cell generated according to the proposed layout style is illustrated in figure 8 (this figure is a *symbolic* layout, because the compaction tool does not deal with 3 metal layers). The routing between not adjacent rows is implemented with the third metal layer. In this way, the routing channel uses metal1 and metal3 for vertical routing and metal2 for horizontal routing (a modified greedy algorithm is used).



Figure 8 - Symbolic layout of a circuit generated according to the proposed approach

# IV. Initial solution for transistor widths

For the linear-matrix approach two main transistor sizing topologies are available: regular and customized. In the regular solution all transistors ('N' and 'P') have the same width. In the customized one, transistors' sizes are defined concerning the load and the structure of each cell, resulting in different transistors' widths. Alternative solution, mobility compensation for the 'N' and 'P' transistors, gives poor results in terms of electrical performances (except for inverter arrays).

Figure 9 illustrates the delay as a function of transistor widths, for different loads. As shown, for small transistor widths (w=2μm to w=10μm), the delay of the circuit is controlled by the load, including parasitic capacitances. For larger width (w>16μm) the delay is almost independent of load variations and dominated only by the constant parasitic contribution (function of the layout style). As a result, the width corresponding to the beginning of the low sensitivity part can be used to define the *regular solution*. The circuit implemented with this width represents a good trade-off between area, speed and power dissipation. Transistor widths greater then this value result in unnecessary expense in area and power (this is the strategy used in pre-characterized approaches to insure the functionality of the cells).



Figure 9 - Delay as a function of transistor widths, for different output loads

Our results indicate that the low sensitivity region, for different circuits and technologies (from  $2.0\mu m$  to  $0.7 \ \mu m$ ), begins when the transistor width is around  $8 \ to \ 10 \ times$  the minimum size. This is the initial solution for the transistor widths.

After the layout synthesis, using the regular solution, the transistors' sizes are calculated [AUV91], taking into account the routing capacitances. The sizing solution allows a better symmetry between rise and fall times, and a total active area reduction.

### Conclusion

In this paper, a new layout style was presented. It minimizes the diffusion capacitances and the polisilicon length by using 3 metal layers for routing. Preliminary results indicate the approach is correct. However, to have conclusive data the router must be concluded (post-processing to eliminate redundant vias).

The proposed layout style aims the synthesis for sub-micron technologies (low contents of parasitic elements), resulting in circuits that can be used for high-speed frequencies with low-power consumption.

Future work includes:

- development of a placement algorithm which minimizes the length difference between wires, resulting in a uniform delay induced by parasitic capacitances, making easier the delay prediction;
- power consumption estimation, by the analysis of the activity of switching nodes;
- having an accurate prediction of area, delay and power, the next step is to integrate the physical synthesis into the logical synthesis tools.

#### References

- [AZE92] **N.Azemard, S.Amat, M.Mellah, D.Auvergne**, "A real characterization based on buffer selection algorithm", PATMOS 1993, pp. 1-9.
- [AUV91] **D.Auvergne, N.Azemard, V.Bonzom, D.Deschacht, M.Robert**, "Formal sizing rules of CMOS circuits", EDAC 91, Amsterdam, 1991, pp. 96-100.
- [FAN95] **C.Fan, W.Jone**, "Time optimization by gate resizing and critical path identification", IEEE Trans. on CAD, Vol. 14, No 2, February 95, pp. 204-217.
- [JOH95] **M.Johann, R.Reis**, "A full-over-the-cell routing model", VLSI 95, Japan, pp. 845-850.
- [HWA93] **C.Hwang, Y.Hsieh, Y.Lin, Y.Hsu,** "An efficient layout style for two-metal CMOS leaf cells and its automatic synthesis", IEEE Trans. on CAD, Vol. 12, No 3, March 93, pp. 410-423.
- [KIM92] **S.Kim, R.M.Owens, M.J.Irwin**, "Experiments with a performance driven module generator", 29th ACM/IEEE Design Automation Conference, 1992, pp. 687-690.
- [KOI95] **T.Koide, M.Ono, S.Wakabayashi, Y.Nishimaru, N.Yoshida**, "*A new performance driven placement method with the Elmore delay model for row based VLSI*", VLSI 95, Japan, pp. 405-412.
- [MAS95] **M.Fukui, N.Shinomiya, T.Akino,** "A new layout synthesis for leaf cell design", VLSI 95, Japan, pp. 259-264.
- [MOR90] **F.Moraes, R.Reis,** "Ferramenta para síntese automática de módulos em lógica aleatória", V SBMICRO, Campinas (Brazil), 1990, pp. 12-21.
- [MOR93] **F.Moraes, N.Azemard, M.Robert, D.Auvergne**, "Flexible macrocell layout generator", 4th ACM/SIGDA Physical Design Workshop, Los Angeles, 1993, pp. 105-116.
- [MOR95] **F.Moraes, L.Torres, M.Robert, D.Auvergne, R.Reis**, "*Performance prediction for automatic layout synthesis*", X SBMICRO, Canela (Brazil), 1995, pp. 89-98.
- [REI88] **R.Reis, R.Gomes, M.Lubaszewski,** "An efficient design methodology for standard cell circuits", ISCAS 88, Helsinki, pp. 1213-1216.
- [REI95] **A.Reis, M.Robert, D.Auvergne, R.Reis**, "Associating CMOS transistors with BDD arcs for technology mapping", Electronic Letters, Vol. 31, No 14, July 1995.
- [SUT93] **S.Sutanthavibul, E.Shragowitz, R.Lin**, "An adaptive timing-driven placement for high performance VLSI's", IEEE Trans. on CAD, Vol. 12, No 10, October 93, pp. 1488-1498.
- [TER94] **M.Terai, K.Nakajima, K.Takahashi, K.Sato**, "A new approach to over-the-cell channel routing with three metal layers", IEEE Trans. on CAD, Vol. 13, No 2, February 94, pp. 187-200.



Figure 1 - TRAGO layout



Figure 2 - TROPIC layout



Figure 7 -Proposed layout