# Advanced Builder settings

<font color="red">**Live FINN tutorial:** We recommend clicking **Cell -> Run All** when you start reading this notebook for "latency hiding".</font>

<img align="left" src="../end2end_example/cybersecurity/finn-example.png" alt="drawing" style="margin-right: 20px" width="250"/>

In this notebook, we'll use the FINN compiler to generate an FPGA accelerator with a streaming dataflow architecture from small convolutional network trained on CIFAR-10. The key idea in such architectures is to parallelize across layers as well as within layers by dedicating a proportionate amount of compute resources to each layer, illustrated on the figure to the left. You can read more about the general concept in the [FINN](https://arxiv.org/pdf/1612.07119) and [FINN-R](https://dl.acm.org/doi/pdf/10.1145/3242897) papers. This is done by mapping each layer to a Vitis HLS description, parallelizing each layer's implementation to the appropriate degree and using on-chip FIFOs to link up the layers to create the full accelerator.

These implementations offer a good balance of performance and flexibility, but building them by hand is difficult and time-consuming. This is where the FINN compiler comes in: it can build streaming dataflow accelerators from an ONNX description to match the desired throughput.

In this tutorial, we will have a more detailed look into the FINN builder tool and explore different options to customize your FINN design. We assume that you have already completed the [Cybersecurity notebooks](../end2end_example/cybersecurity) and that you have a basic understanding of how the FINN compiler works and how to use the FINN builder tool.

## Outline
---------------

1. [Introduction to the CNV-w2a2 network](#intro_cnv)
2. [Recap default builder flow](#recap_builder)
3. [Build steps](#build_step)
    1. [How to make a custom build step](#custom_step)
4. [Folding configuration json](#folding_config)
5. [Additional builder arguments](#builder_arg)
    1. [Verification steps](#verify)
    2. [Examples for additional builder arguments](#example_args)
    3. [Other builder arguments](#other_args)

## Introduction to the CNV-w2a2 network <a id="intro_cnv"></a>

The particular quantized neural network (QNN) we will be targeting in this notebook is referred to as CNV-w2a2 and it classifies 32x32 RGB images into one of ten CIFAR-10 classes. All weights and activations in this network are quantized to two bit, with the exception of the input (which is RGB with 8 bits per channel) and the final output (which is 32-bit numbers). It is similar to the convolutional neural network used in the [cnv_end2end_example](../end2end_example/bnn-pynq/cnv_end2end_example.ipynb) Jupyter notebook.


You'll have a chance to interactively examine the layers that make up the network in Netron in a moment, so that's enough about the network for now. 


In [None]:
from finn.util.basic import make_build_dir
from finn.util.visualization import showInNetron, showSrc
import os
    
build_dir = os.environ['FINN_ROOT'] + "/notebooks/advanced"

In [None]:
import torch
from finn.util.test import get_test_model_trained
from brevitas.export import export_qonnx
from qonnx.util.cleanup import cleanup as qonnx_cleanup
from qonnx.core.modelwrapper import ModelWrapper
from qonnx.core.datatype import DataType

cnv = get_test_model_trained("CNV", 2, 2)
export_onnx_path = build_dir + "/end2end_cnv_w2a2_export.onnx"
export_qonnx(cnv, torch.randn(1, 3, 32, 32), export_onnx_path)
qonnx_cleanup(export_onnx_path, out_file=export_onnx_path)

In [None]:
showInNetron(build_dir+"/end2end_cnv_w2a2_export.onnx")

## Quick recap, how to setup up default builder flow for resource estimations <a id="recap_builder"></a>

In [None]:
import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg
import os
import shutil

model_dir = os.environ['FINN_ROOT'] + "/notebooks/advanced"
model_file = model_dir + "/end2end_cnv_w2a2_export.onnx"

estimates_output_dir = "output_estimates_only"

#Delete previous run results if exist
if os.path.exists(estimates_output_dir):
    shutil.rmtree(estimates_output_dir)
    print("Previous run results deleted!")


cfg_estimates = build.DataflowBuildConfig(
    output_dir          = estimates_output_dir,
    mvau_wwidth_max     = 80,
    target_fps          = 1000000,
    synth_clk_period_ns = 10.0,
    fpga_part           = "xc7z020clg400-1",
    steps               = build_cfg.estimate_only_dataflow_steps,
    generate_outputs=[
        build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
    ]
)

In [None]:
%%time
build.build_dataflow_cfg(model_file, cfg_estimates);

In [None]:
showInNetron(build_dir+"/output_estimates_only/intermediate_models/step_convert_to_hls.onnx")

## Build steps <a id="build_step"></a>

In [None]:
print("\n".join(build_cfg.estimate_only_dataflow_steps))

You can have a closer look at each step by either using the `showSrc()` function or by accessing the doc string.

In [None]:
import finn.builder.build_dataflow_steps as build_dataflow_steps
print(build_dataflow_steps.step_tidy_up.__doc__)

In [None]:
import finn.builder.build_dataflow_steps as build_dataflow_steps
showSrc(build_dataflow_steps.step_tidy_up)

### How to make a custom build step <a id="custom_step"></a>

In [None]:
from finn.util.pytorch import ToTensor
from qonnx.transformation.merge_onnx_models import MergeONNXModels

def custom_step_add_pre_proc(model: ModelWrapper, cfg: build.DataflowBuildConfig):
    ishape = model.get_tensor_shape(model.graph.input[0].name)
    # preprocessing: torchvision's ToTensor divides uint8 inputs by 255
    preproc = ToTensor()
    export_qonnx(preproc, torch.randn(ishape), "preproc.onnx", opset_version=11)
    preproc_model = ModelWrapper("preproc.onnx")
    # set input finn datatype to UINT8
    preproc_model.set_tensor_datatype(preproc_model.graph.input[0].name, DataType["UINT8"])
    model = model.transform(MergeONNXModels(preproc_model))
    return model
    

In [None]:
model_dir = os.environ['FINN_ROOT'] + "/notebooks/advanced"
model_file = model_dir + "/end2end_cnv_w2a2_export.onnx"

estimates_output_dir = "output_pre_proc"

#Delete previous run results if exist
if os.path.exists(estimates_output_dir):
    shutil.rmtree(estimates_output_dir)
    print("Previous run results deleted!")

build_steps = [
    custom_step_add_pre_proc,
    "step_qonnx_to_finn",
    "step_tidy_up",
    "step_streamline",
    "step_convert_to_hls",
    "step_create_dataflow_partition",
    "step_target_fps_parallelization",
    "step_apply_folding_config",
    "step_minimize_bit_width",
    "step_generate_estimate_reports",
]

cfg_estimates = build.DataflowBuildConfig(
    output_dir          = estimates_output_dir,
    mvau_wwidth_max     = 80,
    target_fps          = 1000000,
    synth_clk_period_ns = 10.0,
    fpga_part           = "xc7z020clg400-1",
    steps               = build_steps,
    generate_outputs=[
        build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
    ]
)

In [None]:
%%time
build.build_dataflow_cfg(model_file, cfg_estimates)

In [None]:
showInNetron(build_dir+"/output_pre_proc/intermediate_models/custom_step_add_pre_proc.onnx")

In [None]:
from qonnx.transformation.insert_topk import InsertTopK

def custom_step_add_post_proc(model: ModelWrapper, cfg: build.DataflowBuildConfig):
    model = model.transform(InsertTopK(k=1))
    return model

In [None]:
model_dir = os.environ['FINN_ROOT'] + "/notebooks/advanced"
model_file = model_dir + "/end2end_cnv_w2a2_export.onnx"

estimates_output_dir = "output_pre_and_post_proc"

#Delete previous run results if exist
if os.path.exists(estimates_output_dir):
    shutil.rmtree(estimates_output_dir)
    print("Previous run results deleted!")

build_steps = [
    custom_step_add_pre_proc,
    custom_step_add_post_proc,
    "step_qonnx_to_finn",
    "step_tidy_up",
    "step_streamline",
    "step_convert_to_hls",
    "step_create_dataflow_partition",
    "step_target_fps_parallelization",
    "step_apply_folding_config",
    "step_minimize_bit_width",
    "step_generate_estimate_reports",
]

cfg_estimates = build.DataflowBuildConfig(
    output_dir          = estimates_output_dir,
    mvau_wwidth_max     = 80,
    target_fps          = 1000000,
    synth_clk_period_ns = 10.0,
    fpga_part           = "xc7z020clg400-1",
    steps               = build_steps,
    generate_outputs=[
        build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
    ]
)

In [None]:
%%time
build.build_dataflow_cfg(model_file, cfg_estimates);

In [None]:
showInNetron(build_dir+"/output_pre_and_post_proc/intermediate_models/step_convert_to_hls.onnx")

## Folding configuration json <a id="folding_config"></a>

To learn about the influence of folding factors/parallelism in FINN, please have a look at this notebook: 

In [None]:
import json

with open(build_dir+"/output_pre_and_post_proc/auto_folding_config.json", 'r') as json_file:
    folding_config = json.load(json_file)

print(json.dumps(folding_config, indent=1))

Hardware configuration for each layer

FIFO depths

Type of memory/compute resources to be used

Parallelism along different dimensions (“PE”, ”SIMD”)

Baked-in, decoupled or external parameters

Influences almost all flows

step_apply_folding_config

Values tuned for performance & footprint

Many additional constraints not visible from .json

In [None]:
with open(build_dir+"/output_pre_and_post_proc/report/estimate_layer_resources.json", 'r') as json_file:
    json_object = json.load(json_file)

print(json.dumps(json_object["total"], indent=1))

You can manually change, here we generate two new folding configurations with either all lutram or all bram

In [None]:
# Set all ram_style to LUT RAM
for key in folding_config:
    if "ram_style" in folding_config[key]:
        folding_config[key]["ram_style"] = "distributed" 
# Save as .json    
with open("folding_config_all_lutram.json", "w") as jsonFile:
    json.dump(folding_config, jsonFile)
         
# Set all ram_style to BRAM
for key in folding_config:
    if "ram_style" in folding_config[key]:
        folding_config[key]["ram_style"] = "block" 
# Save as .json    
with open("folding_config_all_bram.json", "w") as jsonFile:
    json.dump(folding_config, jsonFile)

In [None]:
model_dir = os.environ['FINN_ROOT'] + "/notebooks/advanced"
model_file = model_dir + "/end2end_cnv_w2a2_export.onnx"

estimates_output_dir = "output_all_lutram"

#Delete previous run results if exist
if os.path.exists(estimates_output_dir):
    shutil.rmtree(estimates_output_dir)
    print("Previous run results deleted!")

build_steps = [
    custom_step_add_pre_proc,
    custom_step_add_post_proc,
    "step_qonnx_to_finn",
    "step_tidy_up",
    "step_streamline",
    "step_convert_to_hls",
    "step_create_dataflow_partition",
    "step_apply_folding_config",
    "step_minimize_bit_width",
    "step_generate_estimate_reports",
]

cfg_estimates = build.DataflowBuildConfig(
    output_dir          = estimates_output_dir,
    mvau_wwidth_max     = 80,
    synth_clk_period_ns = 10.0,
    fpga_part           = "xc7z020clg400-1",
    steps               = build_steps,
    folding_config_file = "folding_config_all_lutram.json",
    generate_outputs=[
        build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
    ]
)

In [None]:
%%time
build.build_dataflow_cfg(model_file, cfg_estimates);

In [None]:
showInNetron(build_dir+"/output_all_lutram/intermediate_models/step_generate_estimate_reports.onnx")

In [None]:
with open(build_dir+"/output_all_lutram/report/estimate_layer_resources.json", 'r') as json_file:
    json_object = json.load(json_file)

print(json.dumps(json_object["total"], indent=1))

In [None]:
model_dir = os.environ['FINN_ROOT'] + "/notebooks/advanced"
model_file = model_dir + "/end2end_cnv_w2a2_export.onnx"

estimates_output_dir = "output_all_bram"

#Delete previous run results if exist
if os.path.exists(estimates_output_dir):
    shutil.rmtree(estimates_output_dir)
    print("Previous run results deleted!")

build_steps = [
    custom_step_add_pre_proc,
    custom_step_add_post_proc,
    "step_qonnx_to_finn",
    "step_tidy_up",
    "step_streamline",
    "step_convert_to_hls",
    "step_create_dataflow_partition",
    "step_apply_folding_config",
    "step_minimize_bit_width",
    "step_generate_estimate_reports",
]

cfg_estimates = build.DataflowBuildConfig(
    output_dir          = estimates_output_dir,
    mvau_wwidth_max     = 80,
    synth_clk_period_ns = 10.0,
    fpga_part           = "xc7z020clg400-1",
    steps               = build_steps,
    folding_config_file = "folding_config_all_bram.json",
    generate_outputs=[
        build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
    ]
)

In [None]:
%%time
build.build_dataflow_cfg(model_file, cfg_estimates);

In [None]:
showInNetron(build_dir+"/output_all_bram/intermediate_models/step_generate_estimate_reports.onnx")

In [None]:
with open(build_dir+"/output_all_bram/report/estimate_layer_resources.json", 'r') as json_file:
    json_object = json.load(json_file)

print(json.dumps(json_object["total"], indent=1))

## Additional builder arguments <a id="builder_arg"></a>

### Verification steps <a id="verify"></a>

In [None]:
import finn.builder.build_dataflow_steps as build_dataflow_steps
showSrc(build_dataflow_steps.step_tidy_up)

In [None]:
showSrc(build_cfg.VerificationStepType)

In [None]:
# Get golden io pair from Brevitas and save as .npy files
from finn.util.test import get_trained_network_and_ishape, get_example_input, get_topk
import numpy as np


(brevitas_model, ishape) = get_trained_network_and_ishape("cnv", 2, 2)
input_tensor_npy = get_example_input("cnv")
input_tensor_torch = torch.from_numpy(input_tensor_npy).float()
input_tensor_torch = ToTensor().forward(input_tensor_torch).detach()
output_tensor_npy = brevitas_model.forward(input_tensor_torch).detach().numpy()
output_tensor_npy = get_topk(output_tensor_npy, k=1)

np.save("input.npy", input_tensor_npy)
np.save("expected_output.npy", output_tensor_npy)

In [None]:
model_dir = os.environ['FINN_ROOT'] + "/notebooks/advanced"
model_file = model_dir + "/end2end_cnv_w2a2_export.onnx"

estimates_output_dir = "output_with_verification"

#Delete previous run results if exist
if os.path.exists(estimates_output_dir):
    shutil.rmtree(estimates_output_dir)
    print("Previous run results deleted!")

build_steps = [
    custom_step_add_pre_proc,
    custom_step_add_post_proc,
    "step_qonnx_to_finn",
    "step_tidy_up",
    "step_streamline",
    "step_convert_to_hls",
    "step_create_dataflow_partition",
    "step_target_fps_parallelization",
    "step_apply_folding_config",
    "step_minimize_bit_width",
    "step_generate_estimate_reports",
]

cfg_estimates = build.DataflowBuildConfig(
    output_dir          = estimates_output_dir,
    mvau_wwidth_max     = 80,
    target_fps          = 1000000,
    synth_clk_period_ns = 10.0,
    fpga_part           = "xc7z020clg400-1",
    steps               = build_steps,
    generate_outputs=[
        build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
    ],
    verify_steps=[
        build_cfg.VerificationStepType.QONNX_TO_FINN_PYTHON,
        build_cfg.VerificationStepType.TIDY_UP_PYTHON,
        build_cfg.VerificationStepType.STREAMLINED_PYTHON,
    ]
)

In [None]:
%%time
build.build_dataflow_cfg(model_file, cfg_estimates);

### Examples for additional builder arguments <a id="example_args"></a>

#### Standalone Thresholds

 picture of im2col + matmul + multithreshold

In [None]:
model_dir = os.environ['FINN_ROOT'] + "/notebooks/advanced"
model_file = model_dir + "/end2end_cnv_w2a2_export.onnx"

estimates_output_dir = "output_standalone_thresholds"

#Delete previous run results if exist
if os.path.exists(estimates_output_dir):
    shutil.rmtree(estimates_output_dir)
    print("Previous run results deleted!")

build_steps = [
    custom_step_add_pre_proc,
    custom_step_add_post_proc,
    "step_qonnx_to_finn",
    "step_tidy_up",
    "step_streamline",
    "step_convert_to_hls",
    "step_create_dataflow_partition",
    "step_target_fps_parallelization",
    "step_apply_folding_config",
    "step_minimize_bit_width",
    "step_generate_estimate_reports",
]

cfg_estimates = build.DataflowBuildConfig(
    output_dir            = estimates_output_dir,
    mvau_wwidth_max       = 80,
    target_fps            = 1000000,
    synth_clk_period_ns   = 10.0,
    fpga_part             = "xc7z020clg400-1",
    standalone_thresholds = True,
    steps                 = build_steps,
    generate_outputs=[
        build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
    ],
)

In [None]:
%%time
build.build_dataflow_cfg(model_file, cfg_estimates);

In [None]:
showInNetron(build_dir+"/output_standalone_thresholds/intermediate_models/step_generate_estimate_reports.onnx")

#### RTL Convolutional Input Generator

In [None]:
model_dir = os.environ['FINN_ROOT'] + "/notebooks/advanced"
model_file = model_dir + "/end2end_cnv_w2a2_export.onnx"

estimates_output_dir = "output_rtl_swg"

#Delete previous run results if exist
if os.path.exists(estimates_output_dir):
    shutil.rmtree(estimates_output_dir)
    print("Previous run results deleted!")

build_steps = [
    custom_step_add_pre_proc,
    custom_step_add_post_proc,
    "step_qonnx_to_finn",
    "step_tidy_up",
    "step_streamline",
    "step_convert_to_hls",
    "step_create_dataflow_partition",
    "step_target_fps_parallelization",
    "step_apply_folding_config",
    "step_minimize_bit_width",
    "step_generate_estimate_reports",
]

cfg_estimates = build.DataflowBuildConfig(
    output_dir             = estimates_output_dir,
    mvau_wwidth_max        = 80,
    target_fps             = 1000000,
    synth_clk_period_ns    = 10.0,
    fpga_part              = "xc7z020clg400-1",
    force_rtl_conv_inp_gen = True,
    steps                  = build_steps,
    generate_outputs=[
        build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
    ],
)

In [None]:
%%time
build.build_dataflow_cfg(model_file, cfg_estimates);

In [None]:
showInNetron(build_dir+"/output_rtl_swg/intermediate_models/step_generate_estimate_reports.onnx")

### Other builder arguments <a id="other_args"></a>

Let's have a look at the additional builder arguments. We want to only filter out the FINN specific arguments.

In [None]:
# Filter out methods
builder_args = [m for m in dir(build_cfg.DataflowBuildConfig) if not m.startswith('_')]
print("\n".join(builder_args))

There are attributes that come from the dataclasses-json class: to_dict, to_json, schema, from_json, from_dict. These are not FINN builder specific. Some of the arguments we have seen already in the Cybersecurity notebook and in this notebook, e.g. target_fps, fpga_part, folding_config_file, ...
Please have a look here and scroll through the available builder arguments: https://github.com/Xilinx/finn/blob/dev/src/finn/builder/build_dataflow_config.py#L155

So far, in this notebook, we only looked at configurations up to the generation of estimate reports so far, a lot of these builder arguments actually become relevant at a later stage in the FINN flow.

In [None]:
print("\n".join(build_cfg.default_build_dataflow_steps))

You can have a closer look at each step by either using the `showSrc()` function or by accessing the doc string.

In [None]:
import finn.builder.build_dataflow_steps as build_dataflow_steps
print(build_dataflow_steps.step_create_dataflow_partition.__doc__)