# Optimiser Solver

[fgpaconvnet-optimiser](https://github.com/AlexMontgomerie/fpgaconvnet-optimiser) introduces [`Solver`](https://github.com/AlexMontgomerie/fpgaconvnet-optimiser/tree/dev-petros/fpgaconvnet/optimiser/solvers) to automatically apply the `transform`s and explore the design space. Let's still start DSE with the "resource-minimal" status.

In [None]:
from fpgaconvnet.parser.Parser import Parser

onnx_path = "../3.1_model_onnx_parser/fp16/vgg16_bn.onnx"
parser = Parser(custom_onnx=True, batch_size=1)
net = parser.onnx_to_fpgaconvnet(onnx_path)



In [2]:
import fpgaconvnet.optimiser.transforms as transforms

transforms.partition.split_complete(net, None)
for i, partition in enumerate(net.partitions):
    transforms.weights_reloading.apply_max_weights_reloading(partition)
    partition.update()

To create a new [`Solver`](https://github.com/AlexMontgomerie/fpgaconvnet-optimiser/tree/dev-petros/fpgaconvnet/optimiser/solvers) object, it requires two inputs: [`Network`](https://github.com/AlexMontgomerie/fpgaconvnet-model/blob/dev-petros/fpgaconvnet/models/network/Network.py) and [`Platform`](https://github.com/AlexMontgomerie/fpgaconvnet-model/blob/dev-petros/fpgaconvnet/platform/Platform.py) where the latter spcifies the characteristics of the target FPGA device (resources and bandwidth avaiable, target clock frequency, and etc.). 

In this example, let's target the deployment on a [`ZCU104`](https://www.xilinx.com/products/boards-and-kits/zcu104.html) device.

In [3]:
from fpgaconvnet.platform.Platform import Platform

platform = Platform()
platform.update("../../fpgaconvnet-optimiser/examples/platforms/zcu104.toml")
# We select port width 64 in this example
platform.port_width = 64

Currently, we provide two variants of `Solver`: [simulated_anneanling](https://github.com/AlexMontgomerie/fpgaconvnet-optimiser/blob/dev-petros/fpgaconvnet/optimiser/solvers/simulated_annealing.py) and [greedy_partition](https://github.com/AlexMontgomerie/fpgaconvnet-optimiser/blob/dev-petros/fpgaconvnet/optimiser/solvers/greedy_partition.py).  

[simulated_anneanling](https://github.com/AlexMontgomerie/fpgaconvnet-optimiser/blob/dev-petros/fpgaconvnet/optimiser/solvers/simulated_annealing.py) is a well-known stochastic optimisation algorithm. Inside its main optimization loop, a random change is performed to the accelerator configuratoin in each iteration. To use the [simulated_anneanling](https://github.com/AlexMontgomerie/fpgaconvnet-optimiser/blob/dev-petros/fpgaconvnet/optimiser/solvers/simulated_annealing.py) solver, please run the following code, which might take several minutes.

In [4]:
import pathlib
from fpgaconvnet.optimiser.solvers import SimulatedAnnealing

opt = SimulatedAnnealing(net, platform)
opt.bram_to_lut = False
opt.off_chip_streaming = False
opt.balance_bram_uram = False
opt.rsc_allocation = 0.75


opt.run_solver()
opt.update_partitions()

pathlib.Path("sa").mkdir(parents=True, exist_ok=True)
opt.create_report("sa/report.json")
opt.net.save_all_partitions("sa/config.json")

create edge ('Conv_19', 'Relu_20')
create edge ('Conv_12', 'Relu_13')
create edge ('Relu_6', 'Conv_7')
create edge ('Relu_11', 'Conv_12')
create edge ('Relu_35', 'Gemm_36')
create edge ('Relu_13', 'Conv_14')
create edge ('Conv_7', 'Relu_8')
create edge ('Conv_17', 'Relu_18')
╔══════════════╦════╦══════════════════╦═══════╦══╦═══════════════════════╦════╗
║ temperature: ║ 10 ║ Min temperature: ║ 0.007 ║  ║ number of partitions: ║ 36 ║
╚══════════════╩════╩══════════════════╩═══════╩══╩═══════════════════════╩════╝
| COST:                 |    | RESOURCES:   |          |           |                |                |            |            |            |            |
|-----------------------|----|--------------|----------|-----------|----------------|----------------|------------|------------|------------|------------|
|                       |    | URAM         | BRAM     | DSP       | LUT            | FF             | BW         | BW_IN      | BW_OUT     | BW_WEIGHT  |
| 0.148304 (thro

Alternatively, we provide the [`greedy_partition`](https://github.com/AlexMontgomerie/fpgaconvnet-optimiser/blob/dev-petros/fpgaconvnet/optimiser/solvers/greedy_partition.py) solver which is a deterministic, greedy algorithm that iteratively optimizes the slowest layer in the network

In [5]:
net = parser.onnx_to_fpgaconvnet(onnx_path)
transforms.partition.split_complete(net, None)
for i, partition in enumerate(net.partitions):
    transforms.weights_reloading.apply_max_weights_reloading(partition)
    partition.update()



In [6]:
import pathlib
from fpgaconvnet.optimiser.solvers import GreedyPartition

opt = GreedyPartition(net, platform)
opt.bram_to_lut = False
opt.off_chip_streaming = False
opt.balance_bram_uram = False
opt.rsc_allocation = 0.75
opt.transforms = []
opt.transforms_probs = []


opt.run_solver()
opt.update_partitions()
opt.merge_memory_bound_partitions()
opt.update_partitions()

pathlib.Path("gp").mkdir(parents=True, exist_ok=True)
opt.create_report("gp/report.json")
opt.net.save_all_partitions("gp/config.json")

╔══════════════════════════════════════════════╦═════════╦══╦═══════════╦═══╦══════════════════════════════╦══════════╦══════════════════════════╦══════════╗
║ single partition (Part 1) cost (throughput): ║ 90620.8 ║  ║ slowdown: ║ 1 ║ partition optimisation time: ║ 0.85 sec ║ total optimisation time: ║ 0.85 sec ║
╚══════════════════════════════════════════════╩═════════╩══╩═══════════╩═══╩══════════════════════════════╩══════════╩══════════════════════════╩══════════╝
| COST:                 |    | RESOURCES:   |           |            |                |                |            |            |            |            |
|-----------------------|----|--------------|-----------|------------|----------------|----------------|------------|------------|------------|------------|
|                       |    | URAM         | BRAM      | DSP        | LUT            | FF             | BW         | BW_IN      | BW_OUT     | BW_WEIGHT  |
| 0.148526 (throughput) |    | 0.00/96      | 15.61/624

Note the choice of using which solver is actualy case by case.