# Circuit Training

This environment is included in A2Perf.

![The Ariane RISC-V CPU](../../_static/img/CircuitTraining-Ariane-v0.gif)

## Description
Chip floorplanning involves designing the physical layout of a computer chip. Traditionally, it requires months of manual effort by physical design engineers. [`The article,A graph placement methodology for fastchip design`](https://www.nature.com/articles/s41586-021-03544-w.epdf?sharing_token=tYaxh2mR5EozfsSL0WHZLdRgN0jAjWel9jnR3ZoTv0PW0K0NmVrRsFPaMa9Y5We9O4Hqf_liatg-lvhiVcYpHL_YQpqkurA31sxqtmA-E1yNUWVMMVSBxWSp7ZFFIWawYQYnEXoBE4esRDSWqubhDFWUPyI5wK_5B_YIO-D_kS8%3D), describes using reinforcement learning for generating the chip floorplan. In less than six hours, this method can generate chip floorplans that match or exceed human designs in terms of key metrics like power consumption, performance, and chip area. It has already been used to design Google's next-generation AI accelerators and holds the potential to save significant human labor in future chip designs. 

This environment is based on the environment provided in the original article and allows the training of the Ariane RISC-V CPU or a toy problem with a toy macro dataset.
note: this environment is only supported on Linux based OSes.

## Action Space
The actions space consist of all possible locations onto which the current maro can be placed, without violating any hard constraints on density or blockages. At each interaction, the agent sequentially places macros. Once all macros are placed, a force-directed method is used to approcimately place clusters of standard cells. The action space (or the output of the policy network) is the probability distribution of placements of the current macro over the $m \times n$ grid, representing the canvas. The action is then sampled from this probability distribution.

## Observation Space
The observation space encodes information about the partial placement. This includes an adjacency matrix, representing the netlist, the node features, which are the width, height and type of the node, the edge features, which represent the number of connections, the current node to be placed, defined by a macro and finally the metadata if the betkust graph, which includes routing allocations, the total number of wires, the macros and standard cell clusters.

The observation space encodes information about the partial placement of the circuit. This includes:

- `current_node`: the current node to be placed, which is a single integer ranging from 0 to 3499.
- `fake_net_heatmap`: a fake net heatmap, which provides a continuous representation of the heatmap with values between 0.0 and 1.0 across 16,384 points.
- `is_node_placed`: the placement status of nodes, a binary array of size 3500, showing whether each node has been placed (1) or not (0).
- `locations_x`: node locations in the x-axis, a continuous array of size 3500 with values ranging from 0.0 to 1.0, representing the x-coordinates of the nodes.
- `locations_y`: node locations in the y-axis, similar to locations_x, but for the y-coordinates.
- `mask`: a mask, a binary array of size 16,384 indicating the validity or usability of each point in the net heatmap.
- `netlist_index`: a netlist index, which seems to be a placeholder in this case, fixed at 0.

## Rewards
The environment uses a sparse reward structure, which returns `0` for all actions except for the last one, where a negative weigthed sum of proxy wirelength, congestion and density calculates the reward of a rollout. 

More formally, the calculations is described by the following equation:

$$
R_{p,g} = -Wirelength(p,g) - \lambda Congestion(p,g) - \gamma Density(p,g)\\
\lambda = 0.01\\
\gamma = 0.01
$$

With $g$ the netlist, and $p$ the placement drawn from the  policy distribution $\pi_{\theta}$


The wirelength is calculated using the half-perimeter wirelength (HPWL) approximation. It is defined as the half-perimeter of the bounding boxes for all nodes in the netlist. For a given net i, the HPWL can be calulculated as:
$$
HPWL(i) = (max_{b\in i}\{x_b\} -min_{b\in i}\{x_b\} + 1) + (max_{b\in i}\{y_b\} -min_{b\in i}\{y_b\} + 1)
$$
Where $x_b$ and $y_b$ represent the coordinates of the end points of the net. The overall HPWL cost of the chip is calculated as a weighted sum of all half-perimeter bounding boxes. 
$$
HPWL(netlist) = \sum_{i=1}^{N_{netlist}} q(i) HPWL(i) 
$$
Here, $q(i)$ is a normalization factor that improves accuracy of the estimate by increasing the cost of the wirelength as the number of nodes increases.

The congestion is calculated via a proxy congestion, using a simple deterministic routing based on the locations of the drivera nd loads on the net.We keep track of vertical and horizontal allocations in each grid cell separately. To smooth the congestion estimate, we run $5\times1$ convolutional filters in both the vertical and horizontal direction. After all nets are routed, we take the average of the top 10% congestion values, drawing inspiration from the ABA10 metric in MAPLE. The congestion cost is the top 10% average congestion calculated by this process.

## Starting State
The starting state of each episode depends on the problem you have chosen, `Ariane` or `ToyMacro`. The starting state does not depend on anything else and will be constant throughout its usage.

## Episode End
The episode ends when all nodes have been placed.

## Arguments
When creating the circuit training environment, there is parameters we can define to further specify the behaviour. Firstly, we can either create a toy environment or an environment based on the Ariane RISC-V CPU. The two environents will load a different netlist file and initial placement file. 
```python
import gymnasium as gym
import a2perf.domains.circuit_training

env = gym.make('CircuitTraining-Ariane-v0', )
```
Or the toymacro environment
```python
env = gym.make('CircuitTraining-ToyMacro-v0', )

```
#### Required parameters:

By either making the environment for `Ariane` or `ToyMacro`, all required parameters are predefined and you can start using the environment as expected.

#### Optional parameters:

| Parameter                | Type             | Default                              | Description |
|--------------------------|------------------|--------------------------------------|-------------|
| `netlist_file`           | str              | path to `netlist.pb.txt`             | Path to the input netlist file. Predefined by using `Ariane` or `ToyMacro`.|
| `init_placement`         | str              | path to `initial.plc`                | Path to the input initial placement file, used to read grid and canvas size. Predefined by using `Ariane` or `ToyMacro`.|
| `plc_wrapper_main`       | str              | `a2perf/domains/circuit_training/bin/plc_wrapper_main`| Main PLC wrapper. |
| `create_placement_cost_fn` | Callable        | `placement_util.create_placement_cost` | A function that creates the `PlacementCost` object given the netlist and initial placement file. |
| `std_cell_placer_mode`   | str              | `'fd'`                               | Options for fast standard cells placement. The `fd` option uses the force-directed algorithm. |
| `cost_info_fn`           | Callable         | `cost_info_function`                 | The cost function that, given the `plc` object, returns the RL cost. |
| `global_seed`            | int              | `0`                                  | Global seed for initializing environment features, ensuring consistency across actors. |
| `netlist_index`          | int              | `0`                                  | Netlist index in the model static features. |
| `is_eval`                | bool             | `False`                              | If set, saves the final placement in `output_dir`. |
| `save_best_cost`         | bool             | `False`                              | If set, saves the placement if its cost is better than the previously saved placement. |
| `output_plc_file`        | str              | `''`                                 | The path to save the final placement. |
| `cd_finetune`            | bool             | `False`                              | If True, runs coordinate descent to fine-tune macro orientations. Meant for evaluation, not training. |
| `cd_plc_file`            | str              | `'ppo_cd_placement.plc'`             | Name of the coordinate descent fine-tuned `plc` file, saved in the same directory as `output_plc_file`. |
| `train_step`             | Optional[tf.Variable] | `None`                           | A `tf.Variable` indicating the training step, used for saving `plc` files during evaluation. |
| `output_all_features`    | bool             | `False`                              | If true, outputs all observation features. Otherwise, only outputs dynamic observations. |
| `node_order`             | str              | `'descending_size_macro_first'`      | The sequence order of nodes placed by RL. |
| `save_snapshot`          | bool             | `True`                               | If true, saves the snapshot placement. |
| `save_partial_placement` | bool             | `False`                              | If true, evaluation also saves the placement even if RL does not place all nodes when an episode is done. |
| `use_legacy_reset`       | bool             | `False`                              | If true, uses the legacy reset method. |
| `use_legacy_step`        | bool             | `False`                              | If true, uses the legacy step method. |
| `render_mode`            | str              | `None`                               | Specifies the rendering mode `human` or `rgb_array`, if any. |


## Version History
- v0: Initial versions release