In [None]:
from finn.util.visualization import showInNetron
from finn.core.modelwrapper import ModelWrapper
from finn.custom_op.registry import getCustomOp
import json

build_dir = "/workspace/finn"

with open("config.json", "r") as f:
    model_config = json.load(f)

model_name = model_config["model_name"]

model = ModelWrapper(build_dir+"/{}_post_optimiser.onnx".format(model_name))
showInNetron(build_dir+"/{}_post_optimiser.onnx".format(model_name))

In [None]:
from finn.analysis.fpgadataflow.res_estimation import res_estimation
estimate_layer_resources = model.analysis(res_estimation)

from finn.transformation.fpgadataflow.annotate_cycles import AnnotateCycles
from finn.analysis.fpgadataflow.dataflow_performance import dataflow_performance
# need to call AnnotateCycles before dataflow_performance
model = model.transform(AnnotateCycles())
estimate_network_performance = model.analysis(dataflow_performance)

print(estimate_layer_resources)
print(estimate_network_performance)

## 3. Hardware Build <a id='vivado'></a>

We're finally ready to start generating hardware from our network. Depending on whether you want to target a Zynq or Alveo platform, FINN offers two transformations to build the accelerator, integrate into an appropriate shell and build a bitfile. These are `ZynqBuild` and `VitisBuild` for Zynq and Alveo, respectively. In this notebook we'll demonstrate the `ZynqBuild` as these boards are more common and it's much faster to complete bitfile generation for the smaller FPGAs found on them.

As we will be dealing with FPGA synthesis tools in these tasks, we'll define two helper variables that describe the Xilinx FPGA part name and the PYNQ board name that we are targeting.

In [None]:
# print the names of the supported PYNQ boards
from finn.util.basic import pynq_part_map
print(pynq_part_map.keys())

In [None]:
# change this if you have a different PYNQ board, see list above
pynq_board = "Pynq-Z1"
fpga_part = pynq_part_map[pynq_board]
target_clk_ns = 10

In previous versions of FINN, we had to manually go through several steps to generate HLS code, stitch IP, create a PYNQ project and run synthesis. All these steps are now performed by the `ZynqBuild` transform (or the `VitisBuild` transform for Alveo). **As this involves calling HLS synthesis and Vivado synthesis, this transformation will run for some time (up to half an hour depending on your PC).**

In [None]:
from finn.transformation.fpgadataflow.make_zynq_proj import ZynqBuild
model = ModelWrapper(build_dir+"/{}_set_folding_factors.onnx".format(model_name))
model = model.transform(ZynqBuild(platform = pynq_board, period_ns = target_clk_ns))

In [None]:
model.save(build_dir + "/{}_post_synthesis.onnx".format(model_name))

### Examining the generated outputs <a id='gen_outputs'></a>

Let's start by viewing the post-synthesis model in Netron:

In [None]:
showInNetron(build_dir + "/{}_post_synthesis.onnx".format(model_name))

We can see that our sequence of HLS layers has been replaced with `StreamingDataflowPartition`s, each of which point to a different ONNX file. You can open a Netron session for each of them to view their contents. Here, the first and last partitions contain only an `IODMA` node, which was inserted automatically to move data between DRAM and the accelerator. Let's take a closer look at the middle partition, which contains all our layers:

In [None]:
model = ModelWrapper(build_dir + "/{}_post_synthesis.onnx".format(model_name))
sdp_node_middle = getCustomOp(model.graph.node[1])
postsynth_layers = sdp_node_middle.get_nodeattr("model")

showInNetron(postsynth_layers)

We can see that `StreamingFIFO` and `StreamingDataWidthConverter` instances have been automatically inserted into the graph prior to hardware build. Transformations like `ZynqBuild` use the `metadata_props` of the model to put in additional metadata information relevant to the results of the transformation. Let's examine the metadata for the current graph containing all layers:

In [None]:
model = ModelWrapper(postsynth_layers)
model.model.metadata_props

Here we see that a Vivado project was built to create what we call the `stitched IP`, where all the IP blocks implementing various layers will be stitched together. You can view this stitched block design in Vivado, or [here](StreamingDataflowPartition_1.pdf) as an exported PDF.

Moving back to the top-level model, recall that `ZynqBuild` will create a Vivado project and synthesize it, so it will be creating metadata entries related to the paths and files that were created:

In [None]:
model = ModelWrapper(build_dir + "/{}_post_synthesis.onnx".format(model_name))
model.model.metadata_props

Here, we can see the directories that were created for the PYNQ driver (`pynq_driver_dir`) and the Vivado synthesis project (`vivado_pynq_proj`), as well as the locations of the bitfile, hardware handoff file and synthesis report.

In [None]:
! ls {model.get_metadata_prop("vivado_pynq_proj")}

Feel free to examine the generated Vivado project to get a feel for how the system-level integration is performed for the  FINN-generated "stitched IP", which appears as `StreamingDataflowPartition_1` in the top-level block design -- you can see it as a block diagram exported to PDF [here](top.pdf).


## 4.  PYNQ deployment <a id='hw_test'></a>

* [Deployment and Remote Execution](#deploy)
* [Validation on PYNQ Board](#validation)
* [Throughput Test on PYNQ Board](#throughput)


We are almost done preparing our hardware design. We'll now put it in a form suitable for use as a PYNQ overlay, synthesize and deploy it.

### Deployment and Remote Execution <a id='deploy'></a>

We'll now use the `DeployToPYNQ` transformation to create a deployment folder with the bitfile and driver file(s), and copy that to the PYNQ board. You can change the default IP address, username, password and target folder for the PYNQ below.

**Make sure you've [set up the SSH keys for your PYNQ board](https://finn-dev.readthedocs.io/en/latest/getting_started.html#pynq-board-first-time-setup) before executing this step.**

In [None]:
# import os

# # set up the following values according to your own environment
# # FINN will use ssh to deploy and run the generated accelerator
# ip = os.getenv("PYNQ_IP", "192.168.2.99")
# username = os.getenv("PYNQ_USERNAME", "xilinx")
# password = os.getenv("PYNQ_PASSWORD", "xilinx")
# port = os.getenv("PYNQ_PORT", 22)
# target_dir = os.getenv("PYNQ_TARGET_DIR", "/home/xilinx/finn_tfc_end2end_example")
# # set up ssh options to only allow publickey authentication
# options = "-o PreferredAuthentications=publickey -o PasswordAuthentication=no"

# # test access to PYNQ board
# ! ssh {options} {username}@{ip} -p {port} cat /var/run/motd.dynamic

In [None]:
# from finn.transformation.fpgadataflow.make_deployment import DeployToPYNQ

# model = model.transform(DeployToPYNQ(ip, port, username, password, target_dir))
# model.save(build_dir + "/tfc_w1_a1_pynq_deploy.onnx")

Let's verify that the remote access credentials is saved in the model metadata, and that the deployment folder has been successfully copied to the board:

In [None]:
# model.model.metadata_props

In [None]:
# target_dir_pynq = target_dir + "/" + model.get_metadata_prop("pynq_deployment_dir").split("/")[-1]
# target_dir_pynq

In [None]:
# ! ssh {options} {username}@{ip} -p {port} 'ls -l {target_dir_pynq}'

We only have two more steps to be able to remotely execute the deployed bitfile with some test data from the MNIST dataset. Let's load up some test data that comes bundled with FINN.

In [None]:
# from pkgutil import get_data
# import onnx.numpy_helper as nph
# import matplotlib.pyplot as plt

# raw_i = get_data("finn.data", "onnx/mnist-conv/test_data_set_0/input_0.pb")
# x = nph.to_array(onnx.load_tensor_from_string(raw_i))
# plt.imshow(x.reshape(28,28), cmap='gray')

In [None]:
# model = ModelWrapper(build_dir + "/tfc_w1_a1_pynq_deploy.onnx")
# iname = model.graph.input[0].name
# oname = parent_model.graph.output[0].name
# ishape = model.get_tensor_shape(iname)
# print("Expected network input shape is " + str(ishape))

Finally, we can call `execute_onnx` on the graph, which will internally call remote execution with the bitfile, grab the results and return a numpy array. You may recall that one "reshape" node was left out of the StreamingDataflowPartition. We'll do that manually with a numpy function call when passing in the input, but everything else in the network ended up inside the StreamingDataflowPartition so that's all we need to do.

In [None]:
# import numpy as np
# from finn.core.onnx_exec import execute_onnx

# input_dict = {iname: x.reshape(ishape)}
# ret = execute_onnx(model, input_dict)

In [None]:
# ret[oname]

We see that the network correctly predicts this as a digit 2.

### Validating the Accuracy on a PYNQ Board <a id='validation'></a>

All the command line prompts here are meant to be executed with `sudo` on the PYNQ board, so we'll use a workaround (`echo password | sudo -S command`) to get that working from this notebook running on the host computer.

**Ensure that your PYNQ board has a working internet connecting for the next steps, since some there is some downloading involved.**

To validate the accuracy, we first need to install the [`dataset-loading`](https://github.com/fbcotter/dataset_loading) Python package to the PYNQ board. This will give us a convenient way of downloading and accessing the MNIST dataset.


Command to execute on PYNQ:

```sudo pip3 install git+https://github.com/fbcotter/dataset_loading.git@0.0.4#egg=dataset_loading```

In [None]:
# ! ssh {options} -t {username}@{ip} -p {port} 'echo {password} | sudo -S pip3 install git+https://github.com/fbcotter/dataset_loading.git@0.0.4#egg=dataset_loading'

We can now use the `validate.py` script that was generated together with the driver to measure top-1 accuracy on the MNIST dataset.

Command to execute on PYNQ:

`python3.6 validate.py --dataset mnist --batchsize 1000`

In [None]:
# ! ssh {options} -t {username}@{ip} -p {port} 'cd {target_dir_pynq}; echo {password} | sudo -S python3.6 validate.py --dataset mnist --batchsize 1000'

We see that the final top-1 accuracy is 92.96%, which is very close to the 93.17% reported on the [BNN-PYNQ accuracy table in Brevitas](https://github.com/Xilinx/brevitas/tree/master/brevitas_examples/bnn_pynq). 

### Throughput Test on PYNQ Board <a id='throughput'></a>
In addition to the functional verification, FINN also offers the possibility to measure the network performance directly on the PYNQ board. This can be done using the core function `throughput_test`. In the next section we import the function and execute it.
First we extract the `remote_exec_model` again and pass it to the function. The function returns the metrics of the network as dictionary. 

In [None]:
# from finn.core.throughput_test import throughput_test_remote

# model = ModelWrapper(build_dir + "/tfc_w1_a1_pynq_deploy.onnx")
# res = throughput_test_remote(model, 10000)
# print("Network metrics:")
# for key in res:
#     print(str(key) + ": " + str(res[key]))

Together with the values for folding we can evaluate the performance of our accelerator. Each layer has a total folding factor of 64 and because the network is fully pipelined, it follows: `II = 64`. II is the initiation interval and indicates how many cycles are needed for one input to be processed. 

In [None]:
# II = 64
# # frequency in MHz
# f_MHz = 100
# # expected throughput in MFPS
# expected_throughput = f_MHz / II
# # measured throughput (FPS) from throughput test, converted to MFPS
# measured_throughput = res["throughput[images/s]"] * 0.000001
# # peformance
# print("We reach approximately " + str(round((measured_throughput / expected_throughput)*100)) + "% of the ideal performance.")

The measured values were recorded with a batch size of 10000 and at a frequency of 100 MHz. We will be improving the efficiency of the generated accelerator examples in the coming FINN releases.