# Effective Analog ICs Floorplanning with Relational Graph Neural Networks and Reinforcement Learning

1<sup>st</sup> Davide Basso 2<sup>nd</sup> Luca Bortolussi *University of Trieste*Trieste, Italy
davide.basso@phd.units.it, lbortolussi@units.it

3<sup>rd</sup> Mirjana Videnovic-Misic Infineon Technologies AT Villach, Austria

mirjana.videnovic-misic@infineon.com

4<sup>th</sup> Husni Habal Infineon Technologies AG Munich, Germany husni.habal@infineon.com

Abstract—Analog integrated circuit (IC) floorplanning is typically a manual process with the placement of components (devices and modules) planned by a layout engineer. This process is further complicated by the interdependence of floorplanning and routing steps, numerous electric and layout-dependent constraints, as well as the high level of customization expected in analog design. This paper presents a novel automatic floorplanning algorithm based on reinforcement learning. It is augmented by a relational graph convolutional neural network model for encoding circuit features and positional constraints. The combination of these two machine learning methods enables knowledge transfer across different circuit designs with distinct topologies and constraints, increasing the generalization ability of the solution. Applied to 6 industrial circuits, our approach surpassed established floorplanning techniques in terms of speed, area and half-perimeter wire length. When integrated into a procedural generator for layout completion, overall layout time was reduced by 67.3% with a 8.3% mean area reduction compared to manual layout.

Index Terms—Reinforcement Learning, Graph Neural Networks, Analog Circuits, Physical Design.

### I. INTRODUCTION

Designing the layout of analog circuits is a crucial and complex task requiring significant expertise due to their susceptibility to noise, parasitics, alongside stringent topological requirements. This often leads to multiple iterations for layout engineers to achieve an optimal result. The procedure involves two closely entangled steps: floorplanning and routing. Metaheuristics as simulated annealing (SA), particle swarm optimization (PSO), and genetic algorithms (GA) [1] have been employed to streamline the floorplanning step. However, they cannot utilize past or external knowledge to enhance exploration of the solution space, as each problem instance is optimized anew. Works as [2] leverage machine learning (ML) based solutions, specifically Graph Neural Networks (GNNs), to improve the generalization in producing optimal floorplans across diverse circuit topologies. Reinforcement learning (RL) techniques have become effective in tackling combinatorial problems [3], such as floorplanning, by optimally navigating

This work has been developed in the project HoLoDEC (project label 16ME0696) which is partly funded within the Research Programme ICT 2020 by the German Federal Ministry of Education and Research (BMBF) and partially supported by the PNRR project iNEST (Interconnected North-Est Innovation Ecosystem) funded by the European Union Next-GenerationEU (Piano Nazionale di Ripresa e Resilienza (PNRR) – Missione 4 Componente 2, Investimento 1.5 – D.D. 1058 23/06/2022, ECS\_00000043).



Fig. 1. Overview of the automatic layout pipeline.

and focusing on the most promising solution space regions. Since floorplanning can be framed as a sequential decision-making process within a Markov Decision Process (MDP), RL techniques have led to state-of-the-art outcomes in several digital layout applications [4]–[8]. Nevertheless, the use of RL in *analog* layout remains limited, calling for further exploration.

In this paper, we propose a combination of Relational Graph Convolutional Neural Networks (R-GCNs) [9] and RL to create optimal floorplans for diverse analog circuits types and topologies. The R-GCN model provides detailed circuit data information to the RL agent, which combines it with spatial encodings from a Convolutional Neural Network (CNN) [10] to determine the best shape and placement for each component. We also integrate this methodology with the ANAGEN procedural generator framework [11], [12] and an Obstacle Avoiding Rectilinear Steiner Tree (OARSMT) global router, as in [13], streamlining the pipeline for automatic analog IC layout generation. The workflow overview is depicted in Figure 1. The key contributions of this paper are as follows:

- We train an R-GCN model to predict rewards for circuit placements. Once trained, it serves as an encoder of circuit, device, and geometric constraints for RL agent use to guarantee optimal generalization capabilities.
- We present an RL agent that combines graph and pixellevel representations to comprehensively describe circuit and problem characteristics. The agent policy is designed to learn to select optimal shapes and positions for components while ensuring no overlaps and adherence to constraints such as symmetry and alignment. To our knowledge, this is the first time such approach is proposed.

- We validate our novel method on circuits of increasing complexity. Our approach outperforms traditional metaheuristics and existing RL-based methods in terms of area, proxy wirelength, speed and cost metrics.
- Our enhanced procedural layout generation pipeline consistently shortens design timeframes while matching and even surpassing the quality of manually designed layouts.

The remainder of the paper is organized as follows. Section II presents previous works. The background of GNNs and RL is introduced in Section III. Section IV details the problem setting and our R-GCN and RL-based floorplanning approach. Experimental results are provided in Section V, while conclusions are drawn in Section VI.

### II. RELATED WORKS

Floorplanning automation has extensively relied on metaheuristic techniques such as SA, GA or PSO, combined with topological model like Sequence Pair (SP) [14] or B\*-Tree [15] to minimize an objective function through stochastic search. Although producing compact floorplans, these methods leave insufficient space for routing tracks, ultimately resulting in unusable layouts. Additionally, implementing geometric constraints is challenging and tends to increase the already lengthy optimization runtimes, especially for complex circuits. Template-based generators [16], [17] offer an alternative by using fixed templates to find optimal placements. Recently, learning-based methodologies have emerged as promising alternatives. For instance, [2] trained a GNN to predict analog ICs performance based on device placement, plugging it into an SA optimizer, but did not use it to directly produce a floorplan or consider positional constraints. Gusmão et al. in [18] developed an unsupervised encoder-decoder model using attention mechanisms and R-GCNs to embed topological constraints (symmetry and proximity) and generate floorplans; yet, validation to remove overlaps is required and no routing related optimization metric is considered. Ahmadi et al. [19] trained an RL agent to place FinFET modules on a grid, minimizing symmetry and alignment errors, area, and wirelength, but the assumption of fixed device shapes limited the approach's flexibility. This work addresses the aforementioned limitations.

#### III. PRELIMINARIES

#### A. Graph Neural Networks

Circuit netlists and layouts can be represented as graphs, leading EDA methods to benefit from applying GNNs to various design stages [20]. A graph is described as a tuple G = (V, E), where V is the set of nodes and E is the set of edges. The neighborhood of node v, denoted as  $\mathcal{N}(v)$ , is  $u|(u,v) \in E$ . A graph is directed if edge direction matters; otherwise, it is undirected. A graph with multiple types of nodes or edges is defined as heterogeneous, otherwise homogeneous. GNNs are deep learning models that operate on graph-structured inputs, learning continuous embedding vectors per node through a message passing process [21]. Nodes exchange vector-based messages with adjacent nodes over iterations, enriching their

state with neighborhood context. Graph Convolutional Neural Networks (GCNs) [22] aggregate features as follows:

$$h_u^{(l+1)} = \sigma \left( \sum_{v \in \mathcal{N}(u)} \frac{h_v^{(l)} W^{(l)}}{c_u} \right), \tag{1}$$

being  $h_u^{(l+1)}$  the updated node feature vector,  $\sigma$  a non-linear differentiable function, l the l-th GCN layer,  $h_v^{(l)}$  the neighboring nodes feature vector,  $W^{(l)} \in \mathbb{R}^{d \times d}$  a learnable weight matrix, and  $c_u$  a normalization factor. R-GCN extends the aggregation process accounting for different node relationships:

$$h_u^{(l+1)} = \sigma \left( W_0^{(l)} \cdot h_u^{(l)} + \sum_{r \in R} \sum_{v \in \mathcal{N}^r(u)} \frac{W_r^{(l)} \cdot h_v^{(l)}}{c_{u,r}} \right), \quad (2)$$

where  $\mathcal{N}^r(u)$  is the set of neighbors of node u under relation  $r \in R$ ,  $W_0^{(l)} \in \mathbb{R}$  is a learnable weight vector for a node's self-connection in each layer, and finally  $c_{u,r}$  is again a normalization constant that varies with the R-GCN task.

# B. Reinforcement Learning

RL techniques train an agent to discover effective solutions through interactions with the environment, utilizing an MDP framework denoted as  $(\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R}, \gamma)$ . At each time step t, the RL agent is in a state  $s_t \in \mathcal{S}$  and chooses an action  $a_t \in \mathcal{A}$  to execute. Following the transition probability  $\mathcal{P}$ , the agent transitions to a new state  $s_{t+1}$  and receives a reward  $r_t \in \mathcal{R}$  indicating the impact of its action, discounted by  $\gamma$ , to balance the relevance of immediate versus future rewards. The agent eventually learns an optimal policy  $\pi^*(a|s)$  that maximizes the expected sum of discounted rewards  $G_t = \sum_{k=0}^{\infty} \gamma^k R_{t+k+1}$ .

# IV. METHODOLOGY

# A. Reinforcement Learning Formulation

Enhancing the floorplanning algorithm by Basso et al. [13], we integrate an R-GCN model for chip representation learning with an RL agent. The agent selects shapes for circuit components and places them on a discretized grid representing the layout space. The floorplanning MDP is defined as follows:

- States: The state  $s_t$  combines detailed information encoded by the R-GCN model, including the circuit graph g, current instance  $n_k$  32-dimensional embeddings, and local feature maps extracted at the pixel level by a CNN. The latter consist of a  $32\times32$  grid representation  $f_g \in \{0,1\}^{32\times32}$ , and two reward-related masks,  $f_{ds}, f_w \in [0,1]^{32\times32}$ , showing increases in the placement empty space and wirelength proxy metric, similar to [4]. Additionally, three positional masks  $f_p \in \{0,1\}^{3\times32\times32}$ , also used for action masking, delineating the admissible placement cells for the next block, given the three possible shapes and adherence to non-overlapping and optionally defined spatial constraints.
- Actions: The action  $a_t$  at time step t consists in selecting one of three possible shapes for the current block  $b_t$  and determining the grid cell to place its lower left corner.



Fig. 2. 8-structure OTA circuit schematic with its graph representation. Violet edges are for vertical alignment and black for connectivity. Nodes are colored according to the functional block type.

• **Rewards**: We define a partial reward  $r_t$  as the negative increase of proxy wirelength and empty space in the floorplan after  $a_t$ . The end of episode reward is instead equal to the negative weighted sum of floorplan's area, half-perimeter wirelength and, if specified, discrepancy w.r.t. the target aspect ratio.

# B. Preliminary Structure Recognition (SR) and Functional Block Configuration

Given an input schematic, we use Infineon's GCN-based SR tool [23] to detect circuit functional blocks. Following [13], we generate different block shapes by keeping a fixed total device width, i.e. area, and tailoring internal routing and device placement based on the recognized functional structure. These configurations are then provided to the RL agent.

# C. R-GCN Circuit Representation Learning

An optimal floorplanning algorithm should be capable to generalize its performances across various circuit configurations and constraints. R-GCNs can effectively handle graphs of varying dimensions and topology, thanks to their permutation invariance. Works as [6], [24], [25] proposed to pre-train a GNN model on the supervised task of predicting rewards of input circuit graphs. By aligning training tasks with the RL agent's goals, circuit embeddings produced for subsequent stages capture meaningful signals, enhancing RL agent's decision-making with augmented generalization capabilities. In our setting, shown in Figure 2, circuits are represented as undirected graphs where each node  $v_i$  corresponds to a functional block or single device. The edges  $e_i$  represent relationships between nodes (u, v), which can be connectivity (if they are connected in the netlist), horizontal or vertical alignment, or horizontal or vertical symmetry. A node feature vector  $x_u \in X$  includes the block area, internal parameters like transistor or resistance stripe width, terminals routing direction, pin counts, and a 28-dimensional one-hot encoding of the block's functional structure (e.g., current mirror, differential pair, cascode, etc.).

*R-GCN Pre-Training Setup:* Figure 3 shows the R-GCN model architecture, which consists of 4 R-GCN layers followed by a node mean aggregation block to produce the whole graph embedding. This is then fed into 5 fully connected (FC)



Fig. 3. R-GCN architecture for circuit reward prediction.

layers to predict the reward value. The R-GCN training dataset comprises 21600 floorplans and corresponding reward values, generated in 40 hours by optimizing placement w.r.t. area and proxy wirelength metrics using a mixture of SA, GA, and PSO. The circuits of interest vary in size and complexity, including operational transconductance amplifiers (OTAs), bias circuits, drivers, level shifters, clock synchronizers, comparators, and oscillators. Moreover, we ensured a balance between constrained and unconstrained floorplans. The supervised model is trained to minimize the mean squared error between the ground truth and predicted reward associated with the input circuit graph. The training time took 4 minutes on a single Nvidia A30 GPU.

#### D. R-GCN and RL-based Floorplanning

Given the R-GCN trained model, we remove the final FC layers and use the remaining part as encoder for the RL agent, enhancing the transfer learning capabilities of our methodology. Figure 4 provides an overview of the RL architecture. We train the RL agent using a masked version of Proximal Policy Optimization (PPO) [26], a state-of-the-art on-policy algorithm.

1) Designing Action Space and Masking: Large action spaces can hinder RL convergence towards an optimal policy. Therefore, we discretize the layout canvas into a  $32 \times 32$ grid  $f_q$ , effectively balancing performance and accuracy by containing the action space while ensuring constraints satisfaction. The grid height H and width W are computed as  $W=H=\sqrt{\frac{\sum_{i=1}^{m}A_{i}}{R_{\max}}}$  being  $A_{i}$  the area of the  $i^{th}$  circuit device and  $R_{\max}=11$  the maximum empirically derived aspect ratio for a floorplan. Since both H and W depend on the size of each functional block, this design choice accommodates any complex circuit placement. The agent can choose from 3 candidate shapes for a circuit structure, similar to the flexibility human designers have. Combined with selecting the cell for placing the lower-left corner of a block, this results in an action space  $\mathcal{A}$  of size  $3\times32\times32=3072$ . To prevent further escalation of action space dimensionality, we use a heuristic inspired by [24], which arranges block placement in order of decreasing size.

As mentioned earlier, our floorplanning methodology can handle fundamental positional requirements such as symmetry, alignment, and guarantee the absence of device overlap. Works as [27] prove that policy gradient algorithms can take advantage of masking procedures to avoid selecting invalid actions based on state information. Therefore, at each episode step, we generate three positional masks  $f_p$ , one for each possible candidate shape. These masks are obtained by combining two binary matrices: one representing partial placement and the other symmetry or alignment constraint masks. In the first one,



Fig. 4. Overview of the RL model, enriched with CNN based feature extractor and policy network.

a value of 1 indicates an available cell, while 0 signifies an already occupied one. The second one designates with 1s where the corresponding constraint would be satisfied (based on the placement of blocks belonging to a constraint group and the corresponding symmetry or alignment axis) and 0s where it would not. We map the real sizes of each circuit instance (w,h) without approximation, where  $w_g = \left\lceil \frac{w \times 32}{W} \right\rceil$  and  $h_g = \left\lceil \frac{h \times 32}{H} \right\rceil$  are the respective scaled width and height on the grid.

- 2) State Design: Given that the R-GCN model does not supply detailed location information for specific circuit instances during an episode, we enrich the agent's state representation. As mentioned in Section IV-A, this is achieved by combining the current node, i.e. block  $b_t$ , and graph, i.e. circuit, embeddings  $n_k$  and g with 6 additional grid-based masks:  $f_g$ ,  $f_w$ ,  $f_{ds}$  and  $f_p$ , which reflect partial placements, wire and dead space, i.e. empty space, increases, and valid positions. This augmented state aids the RL agent in identifying the optimal placement to optimize rewards. Lai et al. [4] first proposed this approach, using  $f_p$ ,  $f_g$  and computing  $f_w$  as the increase in proxy wirelength when placing  $b_t$  in a specific position. Our approach extends the positional mask  $f_p$  to account for multiple device shapes and introduces the dead space mask  $f_{ds}$ , a normalized, continuous matrix indicating the increase in empty space if  $b_t$ is placed in a certain location. To construct  $f_{ds}$ , we iterate over all available cells on the placement grid, compute the resulting dead space from placing  $b_t$ , and subtract the previous dead space value. Already occupied locations are set to the maximum increment 1 to mask invalid positions. Figure 5 provides a visual example of  $f_w$  and  $f_{ds}$ .
- 3) Policy Design: Treating grid masks as analogous to images, as suggested by [5], leverages CNNs' effectiveness in generating informative embeddings. For this reason, we concatenate the masks forming a single tensor of dimension  $6\times32\times32$  and feed them to a CNN. The convolution layers use a  $3\times3$  kernel size with stride of 1, padding of 1 and 16,32,32,64,64 filter channels, followed by one FC layer to produce a 512-dimensional embedding vector. Finally, after



Fig. 5. Dead space (left) and wire (right) masks. Darker areas highlight higher rewards regions in the case a block is placed there.

concatenating the R-GCN embedding and CNN outputs, this compact state representation is fed into both value and policy network. The policy includes a single FC layer that converts the input vector to a 512-dimensional one, 3 deconvolution layers with a kernel size of  $4\times 4$ , a stride of 2, a padding of 1 and 32, 16, and 8 filter channels. The policy then generates a probability distribution over actions, allowing the agent to jointly decide the shape and location of the new device to be placed.

4) Reward Shaping: As delineated in Section IV-A, the agent's goal is to minimize two primary metrics: area occupation and half-perimeter wirelength (HPWL). HPWL, a widely used approximation for true wirelength, is computed as the half-perimeter of all nets' bounding boxes:

HPWL = 
$$\sum_{i=1}^{n} \max(x_i) - \min(x_i) + \max(y_i) - \min(y_i)$$
, (3)

where  $x_i$ ,  $y_i$  are the endpoints of a net, and n is the total count of nets in the netlist. The dead space DS in a floorplan F is computed as  $1 - \frac{\sum_{i=1}^m A_i}{F_{area}}$ . To better guide the agent during an episode rollout, we provide intermediate rewards  $r_t$  based on the increase in partial floorplan dead space and HPWL, computed from the currently placed instances after possibly taking action  $a_t$ . The intermediate reward is defined as:

$$r_t = -(\Delta_{ds} + \Delta_{HPWL}), \tag{4}$$

where  $\Delta_{ds} = \mathrm{DS}_t - \mathrm{DS}_{t-1}$  and  $\Delta_{\mathrm{HPWL}} = \mathrm{HPWL}_t - \mathrm{HPWL}_{t-1}$ . Given the optional constraint of a fixed aspect ratio for the final floorplan, we define the agent's end of episode reward  $\mathcal{R}$  as:

$$\mathcal{R} = -\left(\alpha \frac{F_{\text{area}}}{\sum_{i=1}^{m} A_i} + \beta \frac{\text{HPWL}}{\text{HPWL}_{\text{min}}} + \gamma (R^* - R)^2\right). \quad (5)$$

Here,  $\alpha$ ,  $\beta$  and  $\gamma$  are empirically set weights (1, 5, and 5, respectively) determined through extensive experimentation. These weights balance area, wirelength and fixed outline error terms in terms of their magnitude and impact on the final floorplan quality. HPWL<sub>min</sub> is the minimum HPWL value estimated through a metaheuristic-based simulation for standardization while  $R^*$  and R are the target and current floorplan aspect ratios, respectively. Finally, whenever the generated floorplan violates any predefined constraint, we penalize the agent's behavior with a reward of -50.

5) RL Training Schedule: Our methodology is designed to develop a single RL agent capable of generating optimal floorplans for a wide array of circuit types and constraints. To achieve this, we use a hybrid curriculum learning (HCL) approach, described in [28], which incrementally presents the agent circuits of growing complexity. Starting with smaller circuits, we interleave them with random sampling of new circuit instances and constraints. This method maintains the agent's exposure to complex scenarios and prevents the loss of previously acquired knowledge, thereby enhancing transferability to new, unseen instances. The circuits used to train the RL are 3 operational transconductance amplifiers (OTAs) and 2 bias ones, respectively encompassing 3, 5, 8, 3 and 9 blocks, to ensure enough diversity in the data.

# E. Routing and Final Layout Generation

Once a feasible floorplan is generated, we construct an OARSMT for each net to minimize wirelength and avoid obstacles. Unlike [13], which required congestion estimation to reserve space for routing channels, our method yields routing ready floorplans. The global routing tree is segmented into conduits, detailing connections and layers, guiding ANAGEN's router to finalize circuit connections. This approach enhances ANAGEN's capabilities and potentially makes it competitive with state-of-the-art techniques [29], [30] by allowing the RL placement engine to focus on higher-level device arrangement, while ANAGEN handles low-level place and route tasks.

#### V. EXPERIMENTAL RESULTS

Our floorplanning pipeline is built in Python 3.9, using DGL [31] and Stable Baselines3 [32] libraries for implementing respectively the R-GCN and RL models.

#### A. RL Placement Training Setup

To foster policy robustness and reduce convergence time, we use 16 parallel environments to gather multiple experiences. Following the HCL approach from Section IV-D5, we train each circuit for 4096 episodes. During the first half of the episodes we avoid introducing new constraints or circuits. After this phase, we begin sampling new circuit instances and constraints



Fig. 6. Episode Reward Mean and Approximate Kullback-Leibler divergence during RL agent's HCL training schedule.

with probabilities  $p_{\rm circuit}$ =0.5 and  $p_{\rm constraint}$ =0.3, respectively. Figure 6 illustrates the benefits of the HCL procedure for convergence towards a stable policy. The agent can recover optimal rewards and achieve low approximate Kullback-Leibler divergence values during circuit sampling, indicating strong generalization and robustness. Training the RL agent on a single Nvidia A30 GPU took 12 hours and 42 minutes.

#### B. Comparison Against Baselines

We evaluate our approach against established metaheuristics such as SA, the same methodology used by state-of-the-art automatic layout generator [30], PSO, GA as well as the two methods from [13], involving a combination of RL with SA and a pure RL technique both based on SP. Unfortunately, the circuits of interest are incompatible with other fully automated layout generation engines [29], [30] due to technology constraints. Performances are measured on 3 industrial designs from the RL training set and 3 unseen ones, aiming to validate transferability with zero-shot and few-shot fine-tuning on novel circuits. Few-shot learning involves refining a pre-trained RL agent by continuing its training on a specific problem or circuit instance, optimizing it for that context. For a fair comparison, congestion-aware device spacing is applied to all other approaches to allocate sufficient room for routing channels, as our methodology provides routing-ready floorplans. No constraints are imposed on any circuit.

Table I lists the interquartile mean and standard deviation values for algorithm computational runtime, floorplan HPWL, dead space, and associated reward. We emphasize the best results in bold, and highlight the second and third best rewards in italic underlined and underlined only, respectively. Unseen circuits are represented as grey rows. As shown, our novel approach surpasses the baselines in terms of floorplan reward in all scenarios with proper fine-tuning, and in 4 out of 6 cases at zero-shot while significantly improving runtime. Moreover, our methodology demonstrates remarkable transfer capability across unseen and more complex designs, even without additional fine-tuning. On new circuits, HPWL and dead space percentage are reduced by 38.7% and 66.8%, respectively, compared to past techniques. Moreover, few-shot fine-tuning improves results compared to the zero-shot model for the same number of iterations. While training time is significant for

TABLE I
COMPARATIVE ANALYSIS OF R-GCN AND RL METHOD, ACROSS VARIOUS FINE-TUNED SETUPS, VERSUS PREVIOUS TECHNIQUES.

| Circuit  | # Struct. | Metric                                               | R-GCN RL 0-shot                                                                                                        | R-GCN RL 1-shot                                                                                                  | R-GCN RL 100-shot                                                               | R-GCN RL 1000-shot                                                                                   | SA                                                                                                    | GA                                                                                                     | PSO                                                                                                      | RL-SA [13]                                                                                            | RL [13]                                                                                                  |
|----------|-----------|------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
| OTA-1    | 5         | Runtime (s)<br>Dead space (%)<br>HPWL (µm)<br>Reward | $\begin{array}{c} \textbf{0.06} \pm \textbf{0.24} \\ 57.5 \pm 1.25 \\ 150.62 \pm 9.0 \\ -2.37 \pm 0.42 \end{array}$    | $7.97 \pm 0.31$ $47.19 \pm 1.7$ $196.89 \pm 11.03$ $-3.35 \pm 0.46$                                              | $15.88 \pm 0.3$ $47.38 \pm 4.32$ $175.94 \pm 18.61$ $-2.61 \pm 0.87$            | $174.13 \pm 3.55$ $43.93 \pm 1.82$ $131.34 \pm 23.76$ $-0.21 \pm 0.91$                               | $0.91 \pm 0.01$ $49.79 \pm 4.28$ $166.03 \pm 18.72$ $-2.04 \pm 0.68$                                  | $4.58 \pm 0.01$<br>$53.95 \pm 5.74$<br>$175.05 \pm 25.85$<br>$-2.46 \pm 1.13$                          | $6.86 \pm 0.03$<br>$46.75 \pm 3.16$<br>$164.32 \pm 11.05$<br>$-1.86 \pm 0.47$                            | $ 1.03 \pm 0.01  51.19 \pm 4.2  164.17 \pm 17.81  -1.97 \pm 0.67 $                                    | $53.08 \pm 0.66$ $49.05 \pm 3.93$ $166.96 \pm 19.75$ $-2.05 \pm 0.82$                                    |
| OTA-2    | 8         | Runtime (s)<br>Dead space (%)<br>HPWL (µm)<br>Reward | $\begin{array}{c} \textbf{0.14} \pm \textbf{0.01} \\ 43.77 \pm 5.27 \\ 202.19 \pm 22.89 \\ -2.52 \pm 0.89 \end{array}$ | $9.2 \pm 0.09$<br>$33.18 \pm 1.95$<br>$154.27 \pm 5.79$<br>$-0.68 \pm 0.23$                                      | $28.0 \pm 0.2$<br>$33.18 \pm 2.34$<br>$164.14 \pm 9.37$<br>$-0.96 \pm 0.31$     | $287.03 \pm 0.88$<br>$35.57 \pm 1.62$<br>$168.29 \pm 7.52$<br>$-1.16 \pm 0.28$                       | $1.09 \pm 0.03$<br>$57.5 \pm 4.08$<br>$244.42 \pm 41.68$<br>$-3.97 \pm 1.54$                          | $\begin{array}{c} 4.98 \pm 0.02 \\ 57.3 \pm 6.77 \\ 237.3 \pm 30.17 \\ -3.68 \pm 1.33 \end{array}$     | $\begin{array}{c} 7.30 \pm 0.03 \\ 55.3 \pm 4.66 \\ 226.27 \pm 31.32 \\ -3.22 \pm 1.17 \end{array}$      | $\begin{array}{c} 1.3 \pm 0.1 \\ 55.68 \pm 4.48 \\ 229.17 \pm 23.26 \\ -3.51 \pm 0.85 \end{array}$    | $\begin{array}{c} 94.64 \pm 1.8 \\ 54.17 \pm 5.4 \\ 234.95 \pm 37.39 \\ -3.62 \pm 1.38 \end{array}$      |
| Bias-1   | 9         | Runtime (s)<br>Dead space (%)<br>HPWL (µm)<br>Reward | 0.16 ± 0.15<br>53.16 ± 3.97<br>271.51 ± 39.03<br>-5.68 ± 1.51                                                          | $8.96 \pm 0.51$<br>$56.85 \pm 1.54$<br>$313.63 \pm 23.50$<br>$-7.40 \pm 0.87$                                    | $26.17 \pm 0.22$<br>$59.93 \pm 5.64$<br>$321.76 \pm 24.11$<br>$-7.70 \pm 0.83$  | $303.62 \pm 3.13$ $45.52 \pm 7.22$ $191.91 \pm 64.07$ $-2.53 \pm 2.60$                               | $1.23 \pm 0.01$ $67.79 \pm 4.74$ $288.97 \pm 40.05$ $-6.61 \pm 1.7$                                   | $5.4 \pm 0.37$<br>$73.28 \pm 5.06$<br>$269.7 \pm 47.05$<br>$-6.29 \pm 2.25$                            | $7.74 \pm 0.48$ $68.22 \pm 4.77$ $329.14 \pm 55.57$ $-8.07 \pm 2.29$                                     | $\begin{array}{c} 1.57 \pm 0.12 \\ 67.57 \pm 4.29 \\ 285.19 \pm 39.65 \\ -6.5 \pm 1.69 \end{array}$   | 72.06 ± 1.27<br>67.58 ± 4.68<br>288.46 ± 39.74<br>-6.65 ± 1.65                                           |
| RS Latch | 7         | Runtime (s)<br>Dead space (%)<br>HPWL (µm)<br>Reward | $0.11 \pm 0.0$<br>$57.85 \pm 2.54$<br>$108.0 \pm 3.22$<br>$-4.04 \pm 0.31$                                             | $\begin{array}{c} 6.82 \pm 1.09 \\ 53.17 \pm 7.73 \\ 118.06 \pm 13.34 \\ \underline{-4.63 \pm 1.27} \end{array}$ | $17.65 \pm 0.05$<br>$59.87 \pm 2.25$<br>$128.63 \pm 16.53$<br>$-5.55 \pm 1.26$  | $\begin{array}{c} 166.22 \pm 0.75 \\ 33.76 \pm 4.41 \\ 97.55 \pm 7.58 \\ -2.34 \pm 0.62 \end{array}$ | $0.99 \pm 0.01$<br>$65.62 \pm 4.32$<br>$127.99 \pm 17.01$<br>$-5.44 \pm 1.58$                         | $4.8 \pm 0.01$<br>$69.99 \pm 5.56$<br>$128.26 \pm 15.95$<br>$-5.58 \pm 1.59$                           | $\begin{array}{c} 6.71 \pm 0.03 \\ 65.57 \pm 4.27 \\ 127.92 \pm 16.62 \\ -5.39 \pm 1.56 \end{array}$     | $\begin{array}{c} 1.16 \pm 0.03 \\ 63.84 \pm 4.57 \\ 124.11 \pm 15.17 \\ -5.03 \pm 1.44 \end{array}$  | $42.96 \pm 0.6$<br>$61.56 \pm 3.63$<br>$120.22 \pm 15.1$<br>$-4.69 \pm 1.39$                             |
| Driver   | 17        | Runtime (s)<br>Dead space (%)<br>HPWL (µm)<br>Reward | <b>0.25</b> ± <b>0.0</b><br>63.5 ± 4.79<br>1794.5 ± 173.3<br>-7.43 ± 1.29                                              | $11.6 \pm 0.1$<br>$62.49 \pm 1.73$<br>$1811.61 \pm 163.9$<br>$-7.43 \pm 1.05$                                    | $79.64 \pm 0.81$<br>$59.71 \pm 3.35$<br>$1811.04 \pm 133.7$<br>$-7.26 \pm 0.84$ | $814.81 \pm 7.76$ $48.17 \pm 4.11$ $1419.57 \pm 61.88$ $-4.43 \pm 0.53$                              | 1.85 ± 0.01<br>71.44 ± 7.22<br>1981.07 ± 319.51<br>-8.55 ± 2.41                                       | $\begin{array}{c} 6.63 \pm 0.06 \\ 73.3 \pm 7.16 \\ 2192.15 \pm 448.45 \\ -10.23 \pm 3.23 \end{array}$ | $ \begin{array}{c} 10.4 \pm 0.06 \\ 70.64 \pm 6.24 \\ 2152.62 \pm 294.24 \\ -9.66 \pm 2.23 \end{array} $ | $2.24 \pm 0.07$<br>$70.8 \pm 7.82$<br>$1862.99 \pm 372.89$<br>$-7.69 \pm 2.72$                        | $\begin{array}{c} 155.08 \pm 3.53 \\ 69.36 \pm 8.42 \\ 1941.85 \pm 516.42 \\ -8.31 \pm 3.73 \end{array}$ |
| Bias-2   | 19        | Runtime (s)<br>Dead space (%)<br>HPWL (µm)<br>Reward | 0.34 ± 0.04<br>68.49 ± 6.81<br>3375.84 ± 235.86<br>-5.95 ± 0.67                                                        | $12.74 \pm 1.4  56.91 \pm 4.39  2967.92 \pm 174.56  -4.34 \pm 0.7$                                               | $88.73 \pm 1.01$<br>$57.36 \pm 3.22$<br>$2780.59 \pm 225.7$<br>$-3.65 \pm 0.57$ | $849.86 \pm 3.56$<br>$45.12 \pm 2.66$<br>$2141.84 \pm 150.64$<br>$-1.43 \pm 0.47$                    | $\begin{array}{c} 2.07 \pm 0.01 \\ 73.68 \pm 4.2 \\ 2896.52 \pm 177.68 \\ -5.17 \pm 0.68 \end{array}$ | $\begin{array}{c} 6.91 \pm 0.02 \\ 70.36 \pm 4.71 \\ 3501.24 \pm 490.11 \\ -6.26 \pm 1.65 \end{array}$ | $\begin{array}{c} 11.6 \pm 0.14 \\ 69.32 \pm 5.75 \\ 3346.66 \pm 498.34 \\ -5.74 \pm 1.67 \end{array}$   | $\begin{array}{c} 2.42 \pm 0.02 \\ 74.89 \pm 4.76 \\ 2872.94 \pm 369.75 \\ -5.08 \pm 1.3 \end{array}$ | $\begin{array}{c} 244.62 \pm 3.27 \\ 70.88 \pm 6.37 \\ 2854.06 \pm 366.79 \\ -4.36 \pm 1.34 \end{array}$ |



Fig. 7. (a) RL-generated placement and OARSMT global routing, (b) post-adjustment floorplan and channels definitions, (c) resulting layout, (d) post-manual refinement optimized layout, and (e) full manual design.

TABLE II
COMPARISON OF AREA, DEAD SPACE, AND LAYOUT GENERATION TIME
BETWEEN OUR AUTOMATED METHOD AND MANUAL DESIGN.

| Circuit | Method         | Area (μm²)                             | Dead space (%)                       | Template<br>generation<br>time (s) | Manual<br>improvement<br>time (h) | Final layout<br>generation<br>time (h) |
|---------|----------------|----------------------------------------|--------------------------------------|------------------------------------|-----------------------------------|----------------------------------------|
| OTA     | Ours<br>Manual | <b>228.6</b> (-14.1%) 266.0            | <b>30.01</b> (- <b>5.98</b> %) 31.92 | 111.0                              | 0.17                              | <b>0.20</b> (-97.5%)                   |
| Bias-1  | Ours<br>Manual | 515.6 (+52.1%)<br><b>247.1</b>         | 54.01 (+8.68%)<br><b>49.32</b>       | 127.1                              | 1 -                               | 1.04 (-87.0%)<br>8                     |
| Driver  | Ours<br>Manual | <b>3584.7</b> ( <b>-2.43</b> %) 3674.0 | <b>38.78</b> (- <b>3.82</b> %) 40.32 | 456.3                              | 20                                | <b>20.13</b> (- <b>37.1</b> %) 32      |

few-shot solutions, once fine-tuned, the model doesn't require retraining, making single floorplan generation runtimes comparable to zero-shot ones and outperforming other approaches.

#### C. Evaluation of Complete Layouts

We plug our novel floorplanning algorithm into the pipeline proposed in [13] and validate its effectiveness by comparing the completed layouts of a 3-block OTA, 9-block Bias, and 17block Driver from [12] against their manually designed counterparts. Notably, the manual Bias layout was crafted without using ANAGEN, unlike the other two. The metrics of interest involve floorplan's area, dead space, and time required to produce a DRC and LVS clean layout. Results in Table II underscore algorithm's capability to quickly generate valid floorplans, facilitating faster design iteration. The OTA and Driver circuits exhibit improved area and dead space metrics, although the Bias circuit has higher dead space percentages. This latter layout in fact benefits from the absence of ANAGEN's routing constraints, resulting in smaller area occupation. Finally, Figure 7 shows a direct comparison between the automatically generated Driver layout from our methodology and the manually designed one. Figures 7a and 7b illustrate the outcome of our floorplanning and global routing algorithm, providing clear guidance for physical design engineers on the expected wireflow. In more complex layouts, manual refinement of routing channels guided by the OARSMT is still necessary to accommodate ANAGEN's routing sensitivities. However, for simpler layouts like the OTA example, routing channel generation was fully automated. Improving this aspect remains a focus for future work.

## VI. CONCLUSIONS AND FUTURE RESEARCH

This paper proposes a combined R-GCN and RL-based methodology for analog ICs floorplanning, ensuring alignment, symmetry, and no overlap constraints compliance. Our approach can generalize and transfer knowledge across different type of circuits, including new, unseen ones. The generated floorplans not only outperform established baselines but also, when integrated into a procedural generation pipeline, yield complete layouts of comparable quality to human generated ones in significantly reduced runtime. In the future, we aim to improve the reward's weights selection through optimization techniques and to augment the floorplan algorithm with detailed routing information, further conditioning the devices placement towards more efficient routing configurations.

#### REFERENCES

- R. B. Singh, A. S. Baghel, and A. Agarwal, "A review on vlsi floorplanning optimization using metaheuristic algorithms," in 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT). IEEE, 2016, pp. 4198–4202.
- [2] Y. Li, Y. Lin, M. Madhusudan, A. Sharma, W. Xu, S. S. Sapatnekar, R. Harjani, and J. Hu, "A customized graph neural network model for guiding analog IC placement," in *Proceedings of the 39th International Conference on Computer-Aided Design*, vol. 2020-November. New York, NY, USA: ACM, Nov. 2020, pp. 1–9. [Online]. Available: https://dl.acm.org/doi/10.1145/3400302.3415624
- [3] N. Mazyavkina, S. Sviridov, S. Ivanov, and E. Burnaev, "Reinforcement learning for combinatorial optimization: A survey," *Computers & Operations Research*, vol. 134, p. 105400, 2021.
- ations Research, vol. 134, p. 105400, 2021.
  [4] Y. Lai, Y. Mu, and P. Luo, "Maskplace: Fast chip placement via reinforced visual representation learning," Advances in Neural Information Processing Systems, vol. 35, pp. 24019–24030, 2022.
- [5] R. Cheng and J. Yan, "On joint learning for solving placement and routing in chip design," in *Proceedings of the 35th International Conference on Neural Information Processing Systems*, ser. NIPS '21. Red Hook, NY, USA: Curran Associates Inc., 2024.
- [6] M. Amini, Z. Zhang, S. Penmetsa, Y. Zhang, J. Hao, and W. Liu, "Generalizable Floorplanner through Corner Block List Representation and Hypergraph Embedding," in *Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining*. New York, NY, USA: ACM, Aug. 2022, pp. 2692–2702. [Online]. Available: https://dl.acm.org/doi/10.1145/3534678.3539220
- [7] Y. Lai, J. Liu, Z. Tang, B. Wang, J. Hao, and P. Luo, "Chipformer: Transferable chip placement via offline decision transformer," in *International Conference on Machine Learning*. PMLR, 2023, pp. 18346–18364.
- Conference on Machine Learning. PMLR, 2023, pp. 18 346–18 364.

  [8] B. Yang, Q. Xu, H. Geng, S. Chen, and Y. Kang, "Miracle: Multi-action reinforcement learning-based chip floorplanning reasoner," in 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), Mar. 2024, p. 1–6. [Online]. Available: https://ieeexplore.ieee.org/document/10546767/?arnumber=10546767
- [9] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling, "Modeling relational data with graph convolutional networks," in *The semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, proceedings 15.* Springer, 2018, pp. 593–607.
- [10] K. O'Shea and R. Nash, "An introduction to convolutional neural networks," 2015. [Online]. Available: https://arxiv.org/abs/1511.08458
- [11] F. Passerini, K. Cherniak, F. Renneke, H. Habal, and C. Sandner, "Anagen: A methodology for analog circuit generation," in *IEEE CICC*, 2021.
- [12] D. Demiri, G. Capodivacca, D. Privato, H. Habal, and F. Renneke, "A procedural generator for the sizing and physical synthesis of a mosfet low-side driver," in 2023 19th Int. Conf. on SMACD, 2023, pp. 1–4.
- [13] D. Basso, L. Bortolussi, M. Videnovic-Misic, and H. Habal, "Fast ml-driven analog circuit layout using reinforcement learning and steiner trees," in 20th Int. Conf. on SMACD, 2024. [Online]. Available: https://arxiv.org/abs/2405.16951
- [14] F. Balasa and K. Lampaert, "Symmetry within the sequence-pair representation in the context of placement for analog design," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 19, no. 7, pp. 721–731, Jul. 2000. [Online]. Available: http://ieeexplore.ieee.org/document/851988/
- [15] M. Shunmugathammal, C. Christopher Columbus, and S. Anand, "A Novel B\*tree Crossover-Based Simulated Annealing Algorithm for Combinatorial Optimization in VLSI Fixed-Outline Floorplans," *Circuits, Systems, and Signal Processing*, vol. 39, no. 2, pp. 900–918, Feb. 2020, publisher: Birkhauser.
- [16] B. Prautsch, U. Eichler, and U. Hatnik, "Generating the generator: A user-driven and template-based approach towards analog layout automation," *Electronics*, vol. 12, no. 4, 2023. [Online]. Available: https://www.mdpi.com/2079-9292/12/4/1047
- [17] R. Martins, N. Lourenço, A. Canelas, R. Póvoa, and N. Horta, "Aida: Robust layout-aware synthesis of analog ics including sizing and layout generation," in 2015 International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD). IEEE 2015, pp. 1–4
- (SMACD). IEEE, 2015, pp. 1–4.
  [18] A. P. L. de Gusmão, N. C. Gomes Horta, N. C. Correia Lourenço, and R. M. Ferreira Martins, "Scalable and order invariant analog integrated circuit placement with Attention-based Graph-to-Sequence

- deep models," *Expert Systems with Applications*, vol. 207, p. 117954, Nov. 2022, publisher: Elsevier Ltd. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0957417422011903
- [19] M. Ahmadi and L. Zhang, "Analog layout placement for FinFET technology using reinforcement learning," in *Proceedings IEEE International Symposium on Circuits and Systems*, vol. 2021-May. Institute of Electrical and Electronics Engineers Inc., 2021, iSSN: 02714310.
- [20] D. S. Lopera, L. Servadei, G. N. Kiprit, S. Hazra, R. Wille, and W. Ecker, "A survey of graph neural networks for electronic design automation," in 2021 ACM/IEEE 3rd Workshop on Machine Learning for CAD (MLCAD). Raleigh, NC, USA: IEEE, Aug. 2021, p. 1–6. [Online]. Available: https://ieeexplore.ieee.org/document/9531070/
- [21] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, "Neural message passing for quantum chemistry," in *International conference on machine learning*. PMLR, 2017, pp. 1263–1272.
- [22] T. N. Kipf and M. Welling, "Semi-supervised classification with graph convolutional networks," in *International Conference on Learning Representations*, 2017. [Online]. Available: https://openreview.net/forum? id=SJU4ayYgl
- [23] R. Patel, H. Habal, and K. R. Venkata, "Machine learning based structure recognition in analog schematics for constraints generation," in *Design* and Verification Conference (DVcon) Europe, Oct. 2021.
- [24] A. Mirhoseini, A. Goldie, M. Yazgan, J. W. Jiang, E. Songhori, S. Wang, Y.-J. Lee, E. Johnson, O. Pathak, A. Nazi, J. Pak, A. Tong, K. Srinivasa, W. Hang, E. Tuncer, Q. V. Le, J. Laudon, R. Ho, R. Carpenter, and J. Dean, "A graph placement methodology for fast chip design." *Nature*, vol. 594, no. 7862, pp. 207–212, Jun. 2021, publisher: Nature Research. [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/34108699
- [25] T. P. Le, H. T. Nguyen, S. Baek, T. Kim, J. Lee, S. Kim, H. Kim, M. Jung, D. Kim, S. Lee, and D. Choi, "Toward Reinforcement Learning-based Rectilinear Macro Placement Under Human Constraints," in Fast ML for Science Workshop, 2023. [Online]. Available: https://fastmachinelearning.org/iccad2023/file/fastml-iccad-23-final7.pdf
- [26] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal Policy Optimization Algorithms," Aug. 2017, arXiv:1707.06347 [cs]. [Online]. Available: http://arxiv.org/abs/1707.06347
- [27] S. Huang and S. Ontañón, "A closer look at invalid action masking in policy gradient algorithms," *The International FLAIRS Conference Proceedings*, vol. 35, May 2022, arXiv:2006.14171 [cs, stat]. [Online]. Available: http://arxiv.org/abs/2006.14171
- [28] W. Zaremba and I. Sutskever, "Learning to Execute," Feb. 2015, arXiv:1410.4615 [cs]. [Online]. Available: http://arxiv.org/abs/1410.4615
- [29] H. Chen, M. Liu, B. Xu, K. Zhu, X. Tang, S. Li, Y. Lin, N. Sun, and D. Z. Pan, "Magical: An open- source fully automated analog ic layout system from netlist to gdsii," *IEEE Design & Test*, vol. 38, no. 2, p. 19–26, Apr. 2021. [Online]. Available: https://ieeexplore.ieee.org/document/9195880/
- [30] T. Dhar, K. Kunal, Y. Li, M. Madhusudan, J. Poojary, A. K. Sharma, W. Xu, S. M. Burns, R. Harjani, J. Hu et al., "Align: A system for automating analog layout," *IEEE Design & Test*, vol. 38, no. 2, pp. 8–18, 2020.
- [31] M. Wang, D. Zheng, Z. Ye, Q. Gan, M. Li, X. Song, J. Zhou, C. Ma, L. Yu, Y. Gai, T. Xiao, T. He, G. Karypis, J. Li, and Z. Zhang, "Deep graph library: A graph-centric, highly-performant package for graph neural networks," 2020. [Online]. Available: https://arxiv.org/abs/1909.01315
- [32] A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, "Stable-baselines3: Reliable reinforcement learning implementations," *Journal of Machine Learning Research*, vol. 22, no. 268, pp. 1–8, 2021. [Online]. Available: http://jmlr.org/papers/v22/ 20-1364.html