# Simulating an Integer Vector MAC w/ Arbolta
In this notebook we will:
- Synthesize an integer vector MAC design with Yosys
- Simulate our design and collect some statistics

## Design Synthesis
We'll be looking at a basic integer vector MAC design in `designs/int_vector_mac.sv` whose top-module interface looks like:
```systemverilog
module int_vector_mac #(
  parameter  int unsigned DataWidth,
  parameter  int unsigned Size,
  parameter  int unsigned AccumulatorWidth
)(
  input  logic                               clock,
  input  logic                               reset_i,
  input  logic signed [DataWidth-1:0]        op0_vec_i [Size],
  input  logic signed [DataWidth-1:0]        op1_vec_i [Size],
  output logic signed [AccumulatorWidth-1:0] mac_o
);
```
This design asynchronously calculates the dot-product of two signed-integer vectors (`op0_vec_1`, `op1_vec_i`) and accumulates the output into a register (`mac_o`) every clock cycle. Resetting the design will clear the accumulator register.

### Cell Library
Our RTL will be synthesized to the cells described in the Liberty cell library `cells/cells.lib`. This library defines the basic cells `BUF`, `NOT`, `NAND`, `NOR`, and `DFF`. Each cell's area roughly equals their respective amount of CMOS transistors.

### Running Synthesis

In [None]:
# Ensure that Yosys is present
# or install from https://github.com/YosysHQ/oss-cad-suite-build
! which yosys

The Yosys synthesis script for our design is `synth.tcl` which can be run from the command-line using:
```bash
yosys -c synth.tcl -- <parameter>=<val> ...
```

The `<parameter>=<val>` arguments can used to override/set the module parameters of `int_vector_mac`. For example, we can synthesize a 16-element, 8-bit integer MAC with a 32-bit accumulator using:

```bash
yosys -c synth.tcl -- DataWidth=8 Size=16 AccumulatorWidth=32
```

The synthesis script will generate the following in the `output` directory:
- `schematic.dot`: [DOT](https://en.wikipedia.org/wiki/DOT_(graph_description_language)) graph of our elaborated design
- `synth.json`: Synthesis netlist in JSON format

**We will use the function `run_synth` which is a wrapper around invoking `synth.tcl`.**

In [None]:
from typing import TypedDict
import subprocess

# Class to hold configuration for our int vector MAC
class SynthConfig(TypedDict):
    DataWidth: int
    Size: int
    AccumulatorWidth: int

# Run synth.tcl with our parameters
def run_synth(synth_config: SynthConfig) -> None:
    synth_params = [f"{p_name}={p_val}" for (p_name, p_val) in synth_config.items()]
    command = ['yosys', '-c', 'synth.tcl', '--', *synth_params]
    p = subprocess.Popen(command, stdout=subprocess.PIPE)
    out, err = p.communicate()
    assert err is None

### Loading Design into Arbolta
Arbolta needs to know what each port of our design does, and how to interpret its bits. This is done with the configuration class `DesignConfig`. `DesignConfig` is a `dict` wrapper which lets us configure each port with a `PortConfig` class whose syntax looks like:

```Python
PortConfig(shape=(int, int), dtype=np.dtype, clock=bool, reset=bool)
```

Let's continue with our example of a 16-element, 8-bit integer MAC with a 32-bit accumulator.

In [None]:
import numpy as np
from arbolta import HardwareDesign, DesignConfig, PortConfig

DESIGN_CONFIG = DesignConfig(
    clock     = PortConfig(clock=True), # Don't need to specify shape
    reset_i   = PortConfig(reset=True), # Don't need to specify shape
    op0_vec_i = PortConfig(shape=(1, 16), dtype=np.int8),
    op1_vec_i = PortConfig(shape=(1, 16), dtype=np.int8),
    mac_o     = PortConfig(shape=(1, 1), dtype=np.int32)
)

SYNTH_CONFIG = SynthConfig(
    DataWidth        = 8,
    Size             = 16,
    AccumulatorWidth = 32
)

run_synth(SYNTH_CONFIG)

design = HardwareDesign("int_vector_mac", "output/synth.json", DESIGN_CONFIG)

(*optional*) Explore the elaborated design (pre-synthesis):

In [None]:
from graphviz_anywidget import graphviz_widget

graphviz_widget(open("output/schematic.dot", 'r').read())

## Simulating our Design
### Cell Breakdown
See how many of the cells described in `hardware/cells.lib` are in the synthesized design:

In [None]:
from pandas import DataFrame

# Cell breakdown for design
cell_df = DataFrame({design.top_module: design.cell_breakdown()})
cell_df

### Area Breakdown
See how much area is taken up by each submodule of our design:

In [None]:
# Extract HDL name from each nested module instance
def filter_module_name(name: str) -> str:
        return list(filter(lambda x: not x.startswith('$') and '=' not in x, 
                           name.split("\\")))[0]

# Get area breakdown for design
# Area ~= transistor count
area_df = DataFrame([{'module': filter_module_name(module), 
                      'area': design.area(module)} for module in design.module_names()])

area_df

In [None]:
%matplotlib widget
from matplotlib import pyplot as plt

# Plot area breakdown
area_df.plot(kind='barh', x='module', xlabel='Area (# of Transistors)', ylabel='')
plt.tight_layout()
plt.show()

### Power Analysis
Throughout the simulation, Arbolta records the number of times each signal has been toggled. We can use these bit-flips as a proxy for dynamic power. 

We can run random inputs through our design to find the average number of bit-flips it incurs per-MAC operation.

In [None]:
# Helper function to pass arbitrary numpy arrays through our MAC design
# Expects inputs to have shape (runs, vector size, 2)
# Returns the average # of bit-flips per MAC operation
def run_mac(design: HardwareDesign, inputs: np.ndarray) -> float:
    runs = inputs.shape[0]
    actual_mac = np.zeros(runs, dtype=np.int32)

    design.reset() # Reset all toggle counts, signals, and flip-flops
    for i, input_pair in enumerate(inputs):
        design.reset_clocked() # Reset accumulator
        design.ports.op0_vec_i = input_pair[:, 0]
        design.ports.op1_vec_i = input_pair[:, 1]
        design.eval_clocked() # Do MAC
        
        actual_mac[i] = design.ports.mac_o.item()

    # Check correctness of design
    expected_mac = (inputs[:,:,0] * inputs[:,:,1]).sum(axis=1)
    assert np.allclose(actual_mac, expected_mac)

    return  design.total_toggle_count() / runs

In [None]:
INT8_MIN, INT8_MAX = -128, 127 # Range of our operand datatype
VECTOR_SIZE        = 16        # Size of int vector MAC
RUNS               = 1000      # Number of MAC operations to average over

# Generate random, uniform inputs
inputs = np.random.randint(INT8_MIN, INT8_MAX + 1, (RUNS, VECTOR_SIZE, 2))

average_toggles = run_mac(design, inputs)
print(f"Average bit-flips per MAC operation = {average_toggles}")

In [None]:
# Get power breakdown per-submodule
power_df = DataFrame([{'module': filter_module_name(module), 
                      'toggles': design.total_toggle_count(module)} for module in design.module_names()])
power_df['toggles'] = power_df['toggles'].div(RUNS)

power_df

In [None]:
# Plot power breakdown
power_df.plot(kind='barh', x='module', xlabel='Avg. Bit-Flips per MAC Operation', ylabel='')
plt.tight_layout()
plt.show()