# FINN Build
------------------------------------------------------------------------------------------------------------------

**Important: This notebook depends on the 2-cybersecurity-finn-verification notebook and on the 1-cybersecurity-Brevitas-1bit notebook, because we are using models that were created by these notebooks. So please make sure the needed .onnx files are generated to run this notebook.**

This notebook will walk you through the build flow and the model deployment into FINN.

## Outline
-------------
1. [Define Some Necessary Parameters](#define_params)
2. [Build the Model](#build_model)
3. [Explore the Build Generated Files](#explore_generated)

    3.1. [Reports](#reports)  
    3.2  [Intermediate Models ](#intermediate_models) 
    
    
4. [Play With Parameters](#play_params)
    
    4.1 [Reports](#4reports)   
    4.2 [Intermediate Models ](#4intermediate_models)

Let's have a look at our model before we start building it. For this, we will use the showInNetron method.

In [1]:
my_model_file = "unsw_nb15_quantized_mlp_1bit.onnx"

from finn.util.visualization import showSrc, showInNetron
showInNetron(my_model_file)

Serving 'unsw_nb15_quantized_mlp_1bit.onnx' at http://0.0.0.0:8081


# 1. Define Some Necessary Parameters <a id="define_params"></a>

All documentation on the FINN build can be found [here](https://finn-dev.readthedocs.io/en/latest/source_code/finn.builder.html).
First we need to define some variables: such as the model name, model file and the platform name.
Moreover, some constants need to be defined which will be used throughout the FINN build. These are:

* **target_fps:** Target inference performance in frames per second. Note that target may not be achievable due to specific layer constraints, or due to resource limitations of the FPGA. If parallelization attributes are specified as part of folding_config_file that will override the target_fps setting here.
* **mvau_wwidth_max:** controls the maximum width of the per-PE MVAU stream while exploring the parallelization attributes to reach target_fps Only relevant if target_fps is specified. Set this to a large value (e.g. 10000) if targeting full unfolding or very high performance.
* **synth_clk_period_ns:** Target clock frequency (in nanoseconds) for Vivado synthesis. e.g. synth_clk_period_ns=5.0 will target a 200 MHz clock. If hls_clk_period_ns is not specified it will default to this value.
* **output_dir:** this is the directory where the final build outputs will be written into.
* **save_intermediate_models:** whether intermediate ONNX files will be saved during the build process. These can be useful for debugging if the build fails.
* **shell_flow_type:** Target shell flow, only needed for generating full bitfiles where the FINN design is integrated into a shell. See [documentation](https://finn-dev.readthedocs.io/en/latest/source_code/finn.builder.html) of ShellFlowType for options.


In [2]:
import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg

my_model_file = "unsw_nb15_quantized_mlp_1bit.onnx"
my_model_name = "mlp_unsw_nb15"
platform_name = "Pynq-Z1"
output_dir    = "output_%s_%s" % (my_model_name, platform_name)

target_fps = 100000
mvau_wwidth_max = 10000
synth_clk_period_ns = 10.0
save_intermediate_models = True
shell_flow_type = build_cfg.ShellFlowType.VIVADO_ZYNQ
enable_build_pdb_debug = False 

# 2. Build the Model <a id="build_model"></a>

The following shows how to build our model. We create a Build configuration which will be passed to the build_dataflow function. In this build configuration we can set up all desired attributes.

In [3]:
import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg

cfg = build.DataflowBuildConfig(
    # can specify detailed folding/FIFO/etc config with:
    # folding_config_file="folding_config.json",  
    
    output_dir          = output_dir,
    target_fps          = target_fps,
    mvau_wwidth_max     = mvau_wwidth_max,
    synth_clk_period_ns = synth_clk_period_ns,
    board               = platform_name,
    shell_flow_type     = shell_flow_type,
    enable_build_pdb_debug=enable_build_pdb_debug,
    
    generate_outputs=[
        build_cfg.DataflowOutputType.PYNQ_DRIVER,
        build_cfg.DataflowOutputType.STITCHED_IP,
        build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
        build_cfg.DataflowOutputType.OOC_SYNTH,
        #build_cfg.DataflowOutputType.BITFILE,
        build_cfg.DataflowOutputType.DEPLOYMENT_PACKAGE,
    ],
    save_intermediate_models=save_intermediate_models,
)

build.build_dataflow_cfg(my_model_file, cfg)


Building dataflow accelerator from unsw_nb15_quantized_mlp_1bit.onnx
Outputs will be generated at output_mlp_unsw_nb15_Pynq-Z1
Build log is at output_mlp_unsw_nb15_Pynq-Z1/build_dataflow.log
Running step: step_tidy_up [1/14]
Running step: step_streamline [2/14]
Running step: step_convert_to_hls [3/14]
Running step: step_create_dataflow_partition [4/14]
Running step: step_target_fps_parallelization [5/14]
Running step: step_apply_folding_config [6/14]
Running step: step_generate_estimate_reports [7/14]
Running step: step_hls_ipgen [8/14]
enable_build_pdb_debug not set in build config, exiting...
Build failed


multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/workspace/finn/src/finn/transformation/fpgadataflow/hlssynth_ip.py", line 68, in applyNodeLocal
    inst.ipgen_singlenode_code()
  File "/workspace/finn/src/finn/custom_op/fpgadataflow/hlscustomop.py", line 321, in ipgen_singlenode_code
    assert os.path.isdir(ipgen_path), "IPGen failed: %s not found" % (ipgen_path)
AssertionError: IPGen failed: /tmp/finn_dev_osboxes/code_gen_ipgen_StreamingFCLayer_Batch_3_sdrjg2g4/project_StreamingFCLayer_Batch_3 not found
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/workspace/finn/src/finn/builder/build_dataflow.py", line 126, in build_dataflow_cfg
    model = transfor

-1

# 3. Explore the Build Generated Files <a id="explore_generated"></a>

All generated output of the build can be found on the folder defined earlier as "output_dir" variable.

Let's see what is inside this directory.

In [4]:
!ls $output_dir

build_dataflow.log  intermediate_models  report


Inside the build directory we have:
* The **"build_dataflow.log"** logs the output of the build flow.

* The **"report" folder** contains all created reports regarding the build. 

* The **"intermediate_models" folder** holds all created models through the intermediate steps.


## 3.1 Reports <a id="reports"></a>

Inside the "report" folder there are the following reports:
* estimate_layer_config_alternatives.json;
* estimate_layer_cycles.json;
* estimate_layer_resources.json;
* estimate_network_performance.json;
* op_and_param_counts.json;

Some of them will be opened up shortly and briefly explained.

#### Network Performance Estimate
As we can see, this report shows us several things, more specifically the estimated throughout in fps, latency in nano seconds and the layer with more bottleneck.

In [46]:
"""import json
with open(output_dir +'/report/op_and_param_counts.json') as f:
    dict_ = json.load(f)
dict_ """

!cat  $output_dir/report/estimate_network_performance.json

{
  "critical_path_cycles": 86176,
  "max_cycles": 75904,
  "max_cycles_node_name": "StreamingFCLayer_Batch_0",
  "estimated_throughput_fps": 1317.4536256323777,
  "estimated_latency_ns": 861760.0
}

#### Operations and Parameter Count
This file shows the amount of multiply-accumulate operations for each layer and the total amount. Note that the layer with the higher amount of nodes owns the highest amount of multiply-accumulate operations. 

In [6]:
!cat  $output_dir/report/op_and_param_counts.json

{
  "StreamingFCLayer_Batch_0": {
    "op_mac_1bx1b": 75904,
    "param_weight_1b": 75904,
    "param_threshold_10b": 128
  },
  "StreamingFCLayer_Batch_1": {
    "op_mac_1bx1b": 8192,
    "param_weight_1b": 8192,
    "param_threshold_8b": 64
  },
  "StreamingFCLayer_Batch_2": {
    "op_mac_1bx1b": 2048,
    "param_weight_1b": 2048,
    "param_threshold_7b": 32
  },
  "StreamingFCLayer_Batch_3": {
    "op_mac_1bx1b": 32,
    "param_weight_1b": 32,
    "param_threshold_6b": 1
  },
  "total": {
    "op_mac_1bx1b": 86176.0,
    "param_weight_1b": 86176.0,
    "param_threshold_10b": 128.0,
    "param_threshold_8b": 64.0,
    "param_threshold_7b": 32.0,
    "param_threshold_6b": 1.0
  }
}

#### Resources per Layer Estimate
This file shows the amount of hardware resources utilized per each layer and the the total amount of BRAMs, LUTs, URAMs and DSPs needed to implement this model on hardware. 

In [7]:
!cat  $output_dir/report/estimate_layer_resources.json

{
  "StreamingFCLayer_Batch_0": {
    "BRAM_18K": 17,
    "BRAM_efficiency": 0.24223856209150327,
    "LUT": 4264,
    "URAM": 0,
    "URAM_efficiency": 1,
    "DSP": 0
  },
  "StreamingFCLayer_Batch_1": {
    "BRAM_18K": 1,
    "BRAM_efficiency": 0.4444444444444444,
    "LUT": 433,
    "URAM": 0,
    "URAM_efficiency": 1,
    "DSP": 0
  },
  "StreamingFCLayer_Batch_2": {
    "BRAM_18K": 1,
    "BRAM_efficiency": 0.1111111111111111,
    "LUT": 350,
    "URAM": 0,
    "URAM_efficiency": 1,
    "DSP": 0
  },
  "StreamingFCLayer_Batch_3": {
    "BRAM_18K": 1,
    "BRAM_efficiency": 0.001736111111111111,
    "LUT": 327,
    "URAM": 0,
    "URAM_efficiency": 1,
    "DSP": 0
  },
  "total": {
    "BRAM_18K": 20.0,
    "LUT": 5374.0,
    "URAM": 0.0,
    "DSP": 0.0
  }
}

## 3.2 Intermediate Models <a id="intermediate_models"></a>

Inside the **intermediate_models** folder we can find all intermediate models generated from FINN build flow.
* 1_step_tidy_up.onnx		       
* 2_step_streamline.onnx		       
* 3_step_convert_to_hls.onnx	       
* 4_step_create_dataflow_partition.onnx 
* 5_step_target_fps_parallelization.onnx
* 6_step_apply_folding_config.onnx
* 7_step_generate_estimate_reports.onnx
* dataflow_parent.onnx

We can look at these models with showInNetron.

In [8]:
layer = "2_step_streamline"
showInNetron(output_dir +"/intermediate_models/" + layer + ".onnx")


Stopping http://0.0.0.0:8081
Serving 'output_mlp_unsw_nb15_Pynq-Z1/intermediate_models/2_step_streamline.onnx' at http://0.0.0.0:8081


In [9]:
layer = "3_step_convert_to_hls"
showInNetron(output_dir +"/intermediate_models/" + layer + ".onnx")


Stopping http://0.0.0.0:8081
Serving 'output_mlp_unsw_nb15_Pynq-Z1/intermediate_models/3_step_convert_to_hls.onnx' at http://0.0.0.0:8081


-----------------------------------------------------------

# 4. Play With Parameters <a id="play_params"></a>

Now let's decrease the "target_fps" and see what happens.

In [56]:
output_dir    = "output2_%s_%s" % (my_model_name, platform_name)

target_fps = 100
mvau_wwidth_max = 10000
synth_clk_period_ns = 10.0
save_intermediate_models = True
shell_flow_type = build_cfg.ShellFlowType.VIVADO_ZYNQ
#large_fifo_mem_style = "distributed" 
#default_mem_mode = "const" 
enable_build_pdb_debug = False 

In [55]:
import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg

cfg = build.DataflowBuildConfig(
    # can specify detailed folding/FIFO/etc config with:
    # folding_config_file="folding_config.json",  
    
    output_dir          = output_dir,
    target_fps          = target_fps,
    mvau_wwidth_max     = mvau_wwidth_max,
    synth_clk_period_ns = synth_clk_period_ns,
    board               = platform_name,
    shell_flow_type     = shell_flow_type,
    enable_build_pdb_debug= enable_build_pdb_debug,
    
    generate_outputs=[
        build_cfg.DataflowOutputType.PYNQ_DRIVER,
        build_cfg.DataflowOutputType.STITCHED_IP,
        build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
        build_cfg.DataflowOutputType.OOC_SYNTH,
        #build_cfg.DataflowOutputType.BITFILE,
        build_cfg.DataflowOutputType.DEPLOYMENT_PACKAGE,
    ],
    save_intermediate_models=save_intermediate_models,
)

build.build_dataflow_cfg(my_model_file, cfg)


Building dataflow accelerator from unsw_nb15_quantized_mlp_1bit.onnx
Outputs will be generated at output2_mlp_unsw_nb15_Pynq-Z1
Build log is at output2_mlp_unsw_nb15_Pynq-Z1/build_dataflow.log
Running step: step_tidy_up [1/14]
Running step: step_streamline [2/14]
Running step: step_convert_to_hls [3/14]
Running step: step_create_dataflow_partition [4/14]
Running step: step_target_fps_parallelization [5/14]
Running step: step_apply_folding_config [6/14]
Running step: step_generate_estimate_reports [7/14]
Running step: step_hls_ipgen [8/14]
enable_build_pdb_debug not set in build config, exiting...
Build failed


multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/workspace/finn/src/finn/transformation/fpgadataflow/hlssynth_ip.py", line 68, in applyNodeLocal
    inst.ipgen_singlenode_code()
  File "/workspace/finn/src/finn/custom_op/fpgadataflow/hlscustomop.py", line 321, in ipgen_singlenode_code
    assert os.path.isdir(ipgen_path), "IPGen failed: %s not found" % (ipgen_path)
AssertionError: IPGen failed: /tmp/finn_dev_osboxes/code_gen_ipgen_StreamingFCLayer_Batch_3_qdg9qmrq/project_StreamingFCLayer_Batch_3 not found
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/workspace/finn/src/finn/builder/build_dataflow.py", line 126, in build_dataflow_cfg
    model = transfor

-1

# 4.1 Reports  <a id="4reports"></a>

#### Network Performance Estimate
As we can see, the amount of critical path cycles has increased. The throughout is lower to match the target throughput and the latency has increased. The layer with more bottlenecks remains the same.

In [49]:
!cat  $output_dir/report/estimate_network_performance.json

{
  "critical_path_cycles": 86176,
  "max_cycles": 75904,
  "max_cycles_node_name": "StreamingFCLayer_Batch_0",
  "estimated_throughput_fps": 1317.4536256323777,
  "estimated_latency_ns": 861760.0
}

#### Operations and Parameters Count 
The amount of multiply-accumulate operations remains the same.

In [50]:
!cat  $output_dir/report/op_and_param_counts.json

{
  "StreamingFCLayer_Batch_0": {
    "op_mac_1bx1b": 75904,
    "param_weight_1b": 75904,
    "param_threshold_10b": 128
  },
  "StreamingFCLayer_Batch_1": {
    "op_mac_1bx1b": 8192,
    "param_weight_1b": 8192,
    "param_threshold_8b": 64
  },
  "StreamingFCLayer_Batch_2": {
    "op_mac_1bx1b": 2048,
    "param_weight_1b": 2048,
    "param_threshold_7b": 32
  },
  "StreamingFCLayer_Batch_3": {
    "op_mac_1bx1b": 32,
    "param_weight_1b": 32,
    "param_threshold_6b": 1
  },
  "total": {
    "op_mac_1bx1b": 86176.0,
    "param_weight_1b": 86176.0,
    "param_threshold_10b": 128.0,
    "param_threshold_8b": 64.0,
    "param_threshold_7b": 32.0,
    "param_threshold_6b": 1.0
  }
}

#### Layer Resources Estimate
The amount of resources has decreased. The model is using less BRAM and less LUTs.

In [51]:
!cat  $output_dir/report/estimate_layer_resources.json

{
  "StreamingFCLayer_Batch_0": {
    "BRAM_18K": 5,
    "BRAM_efficiency": 0.8236111111111111,
    "LUT": 357,
    "URAM": 0,
    "URAM_efficiency": 1,
    "DSP": 0
  },
  "StreamingFCLayer_Batch_1": {
    "BRAM_18K": 1,
    "BRAM_efficiency": 0.4444444444444444,
    "LUT": 334,
    "URAM": 0,
    "URAM_efficiency": 1,
    "DSP": 0
  },
  "StreamingFCLayer_Batch_2": {
    "BRAM_18K": 1,
    "BRAM_efficiency": 0.1111111111111111,
    "LUT": 330,
    "URAM": 0,
    "URAM_efficiency": 1,
    "DSP": 0
  },
  "StreamingFCLayer_Batch_3": {
    "BRAM_18K": 1,
    "BRAM_efficiency": 0.001736111111111111,
    "LUT": 327,
    "URAM": 0,
    "URAM_efficiency": 1,
    "DSP": 0
  },
  "total": {
    "BRAM_18K": 8.0,
    "LUT": 1348.0,
    "URAM": 0.0,
    "DSP": 0.0
  }
}

## 4.2 Intermediate Models <a id="4intermediate_models"></a>
Now, let's have a look at the intermediate models

In [52]:
layer = "2_step_streamline"
showInNetron(output_dir +"/intermediate_models/" + layer + ".onnx")


Stopping http://0.0.0.0:8081
Serving 'output2_mlp_unsw_nb15_Pynq-Z1/intermediate_models/2_step_streamline.onnx' at http://0.0.0.0:8081


In [53]:
layer = "3_step_convert_to_hls" 
showInNetron(output_dir +"/intermediate_models/" + layer + ".onnx")


Stopping http://0.0.0.0:8081
Serving 'output2_mlp_unsw_nb15_Pynq-Z1/intermediate_models/3_step_convert_to_hls.onnx' at http://0.0.0.0:8081
