# FINN Instrumentation Wrapper Flow (Part 1/2)
#### **NOTE: Make sure the Jupyter server was started in your FINN repository using the command `./run-docker notebook`, the FINN builds will fail otherwise.**

This Jupyter notebook will build a simple model and platform to be used in running the instrumentation wrapper.

## Build the model using the FINN compiler
The build flow is similar to the flows described within the other FINN and FINN-examples notebooks. However, there are additional steps added to the default build_dataflow:\
\
`test_step_gen_vitis_xo`\
`test_step_gen_instrumentation_wrapper`\
`test_step_gen_instrwrap_sim` \
`test_step_insert_tlastmarker` \
\
These steps will generate additional output products, namely `.xo` kernel IP files which will be used to link the FINN design and instrumentation wrapper to the hardware platform and allow it to be run through Vitis. They will also generate a Vivado testbench which can be simulated to obtain values to compare against the hardware run. \
\
First, the necessary modules are imported and the model file is given. The model which we will be using is TFC-w1a1, trained on the MNIST dataset. The board name and part are also given. In this case we are targeting the VMK180 from the Versal Prime series, though other Versal boards such as the VCK190 may also be compatible with this build flow. Then, the platform Vitis IP directory to which the `.xo` files will be copied into in order to build the Vitis platform is given.

In [1]:
##
# Copyright (C) 2023, Advanced Micro Devices, Inc. All rights reserved.
##

import numpy as np
import os
import shutil
from qonnx.custom_op.registry import getCustomOp

import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg
import finn.util.data_packing as dpk
from finn.custom_op.fpgadataflow.templates import ipgentcl_template
from finn.transformation.fpgadataflow.vitis_build import CreateVitisXO
from finn.util.hls import CallHLS
from finn.transformation.fpgadataflow.insert_tlastmarker import InsertTLastMarker
from finn.transformation.fpgadataflow.hlssynth_ip import HLSSynthIP
from finn.transformation.fpgadataflow.prepare_ip import PrepareIP

model_file = "model.onnx"
model_name = "tfc_w1a1"

platform_name = "VMK180"
fpga_part = "xcvm1802-vsva2197-2MP-e-S"

vitis_ip_dir = "instr_wrap_platform/vitis/ip"

The aforementioned additional steps are then defined.
\
\
`test_step_gen_vitis_xo` will take the stitched model created using the FINN compiler and generate the `.xo` file for the FINN design.

In [2]:
def test_step_gen_vitis_xo(model, cfg):
    xo_dir = cfg.output_dir + "/xo"
    xo_dir = str(os.path.abspath(xo_dir))
    os.makedirs(xo_dir, exist_ok=True)
    model = model.transform(CreateVitisXO())
    xo_path = model.get_metadata_prop("vitis_xo")
    shutil.copy(xo_path, xo_dir)
    return model

`test_step_gen_instrumentation_wrapper` will first get the input and output properties of the FINN model. It will then use these values to fill out the template found in `templates/instrumentation_wrapper.template.cpp`, and save the filled template to an output file. It will also fill out a template and save a `.tcl` file for use in HLS synthesis of the instrumentation wrapper. These files will then be used to generate the `.xo` file for the instrumentation wrapper.

In [3]:
def test_step_gen_instrumentation_wrapper(model, cfg):
    xo_dir = cfg.output_dir + "/xo"
    xo_dir = str(os.path.abspath(xo_dir))
    os.makedirs(xo_dir, exist_ok=True)
    wrapper_output_dir = cfg.output_dir + "/instrumentation_wrapper"
    wrapper_output_dir = str(os.path.abspath(wrapper_output_dir))
    os.makedirs(wrapper_output_dir, exist_ok=True)
    # conservative max for pending feature maps: number of layers
    pending = len(model.graph.node)
    # query the parallelism-dependent folded input shape from the
    # node consuming the graph input
    inp_name = model.graph.input[0].name
    inp_node = getCustomOp(model.find_consumer(inp_name))
    inp_shape_folded = list(inp_node.get_folded_input_shape())
    inp_stream_width = inp_node.get_instream_width_padded()
    # number of beats per input is given by product of folded input
    # shape except the last dim (which is the stream width)
    ilen = np.prod(inp_shape_folded[:-1])
    ti = "ap_uint<%d>" % inp_stream_width
    # perform the same for the output
    out_name = model.graph.output[0].name
    out_node = getCustomOp(model.find_producer(out_name))
    out_shape_folded = list(out_node.get_folded_output_shape())
    out_stream_width = out_node.get_outstream_width_padded()
    olen = np.prod(out_shape_folded[:-1])
    to = "ap_uint<%d>" % out_stream_width
    ko = out_shape_folded[-1]
    # fill out instrumentation wrapper template
    with open("templates/instrumentation_wrapper.template.cpp", "r") as f:
        instrwrp_cpp = f.read()
    instrwrp_cpp = instrwrp_cpp.replace("@PENDING@", str(pending))
    instrwrp_cpp = instrwrp_cpp.replace("@ILEN@", str(ilen))
    instrwrp_cpp = instrwrp_cpp.replace("@OLEN@", str(olen))
    instrwrp_cpp = instrwrp_cpp.replace("@TI@", str(ti))
    instrwrp_cpp = instrwrp_cpp.replace("@TO@", str(to))
    instrwrp_cpp = instrwrp_cpp.replace("@KO@", str(ko))
    with open(wrapper_output_dir + "/top_instrumentation_wrapper.cpp", "w") as f:
        f.write(instrwrp_cpp)
    # fill out HLS synthesis tcl template
    prjname = "project_instrwrap"
    ipgentcl = ipgentcl_template
    ipgentcl = ipgentcl.replace("$PROJECTNAME$", prjname)
    ipgentcl = ipgentcl.replace("$HWSRCDIR$", wrapper_output_dir)
    ipgentcl = ipgentcl.replace("$TOPFXN$", "instrumentation_wrapper")
    ipgentcl = ipgentcl.replace("$FPGAPART$", cfg._resolve_fpga_part())
    ipgentcl = ipgentcl.replace("$CLKPERIOD$", str(cfg.synth_clk_period_ns))
    ipgentcl = ipgentcl.replace("$DEFAULT_DIRECTIVES$", "")
    ipgentcl = ipgentcl.replace("$EXTRA_DIRECTIVES$", "config_export -format xo")
    # use Vitis RTL kernel (.xo) output instead of IP-XACT
    ipgentcl = ipgentcl.replace("export_design -format ip_catalog", "export_design -format xo")
    with open(wrapper_output_dir + "/hls_syn.tcl", "w") as f:
        f.write(ipgentcl)
    # build bash script to launch HLS synth and call it
    code_gen_dir = wrapper_output_dir
    builder = CallHLS()
    builder.append_tcl(code_gen_dir + "/hls_syn.tcl")
    builder.set_ipgen_path(code_gen_dir + "/{}".format(prjname))
    builder.build(code_gen_dir)
    ipgen_path = builder.ipgen_path
    assert os.path.isdir(ipgen_path), "HLS IPGen failed: %s not found" % (ipgen_path)
    ip_path = ipgen_path + "/sol1/impl/ip"
    assert os.path.isdir(ip_path), "HLS IPGen failed: %s not found. Check log under %s" % (
        ip_path,
        code_gen_dir,
    )
    xo_path = code_gen_dir + "/{}/sol1/impl/export.xo".format(prjname)
    xo_instr_path = xo_dir + "/instrumentation_wrapper.xo"
    shutil.copy(xo_path, xo_instr_path)

    return model

`test_step_gen_instrwrap_sim` will generate a testbench for the FINN design and instrumentation wrapper, in the form of a `.tcl` script. This can be sourced in Vivado to run the testbench and obtain simulated performance results. The simulation will also produce a checksum value which can be compared with the value obtained from running on the hardware to verify that the design is functioning as expected.

In [5]:
def test_step_gen_instrwrap_sim(model, cfg):
    sim_output_dir = cfg.output_dir + "/instrwrap_sim"
    os.makedirs(sim_output_dir, exist_ok=True)
    # fill in testbench template
    with open("templates/instrwrap_testbench.template.sv", "r") as f:
        testbench_sv = f.read()
    with open(sim_output_dir + "/instrwrap_testbench.sv", "w") as f:
        f.write(testbench_sv)
    # fill in testbench project creator template
    with open("templates/make_instrwrap_sim_proj.template.tcl", "r") as f:
        testbench_tcl = f.read()
    testbench_tcl = testbench_tcl.replace("@FPGA_PART@", cfg.fpga_part)
    with open(sim_output_dir + "/make_instrwrap_sim_proj.tcl", "w") as f:
        f.write(testbench_tcl)

    return model

`test_step_insert_tlastmarker` will insert a TLastMarker node into the model, right before the final stitched IP is generated. This node will generate a TLAST for the AXI stream output on the hardware. TLAST will be high during the last stream transaction of the output to indicate that the output for one sample is completed, and a new one starts after.

In [4]:
def test_step_insert_tlastmarker(model, cfg):
    model = model.transform(InsertTLastMarker(
        # only insert marker on output (input TLAST is ignored for these use-cases)
        both=False,
        # use ap_axiu instead of qdma_axis
        external=False,
        # static number of iterations (based on what the compiler/folding sets up)
        dynamic=False
    ))
    # give a proper name to the inserted node, important for codegen
    model.graph.node[-1].name = "TLastMarker_0"
    # re-run codegen and HLS IP gen, will affect only the new TLastMarker layer assuming
    # all other IPs have been generated already
    model = model.transform(PrepareIP(cfg._resolve_fpga_part(), cfg._resolve_hls_clk_period()))
    model = model.transform(HLSSynthIP())
    
    return model

With the additional steps defined, they can then be appended to the build flow. The other necessary configurations for the build will also be set.

In [6]:
build_steps = build_cfg.default_build_dataflow_steps + [
    test_step_gen_vitis_xo,
    test_step_gen_instrumentation_wrapper,
    test_step_gen_instrwrap_sim,
]

# insert tlast marker before stitched ip
step_stitchedip_ind = build_steps.index("step_create_stitched_ip")
build_steps.insert(step_stitchedip_ind, test_step_insert_tlastmarker)
build_steps.remove("step_specialize_to_rtl")

cfg = build.DataflowBuildConfig(
    steps=build_steps,
    board=platform_name,
    fpga_part=fpga_part,
    output_dir="output_%s_%s" % (model_name, platform_name),
    synth_clk_period_ns=3.3,
    folding_config_file="folding_config.json",
    stitched_ip_gen_dcp=False,
    generate_outputs=[
        build_cfg.DataflowOutputType.STITCHED_IP,
    ],
    save_intermediate_models=True,
)

Finally, the build will be launched. This will take a few minutes.

In [7]:
build.build_dataflow_cfg(model_file, cfg)

Building dataflow accelerator from model.onnx
Intermediate outputs will be generated in /scratch/hannayan/builds
Final outputs will be generated in output_tfc_w1a1_VMK180
Build log is at output_tfc_w1a1_VMK180/build_dataflow.log
Running step: step_qonnx_to_finn [1/23]
Running step: step_tidy_up [2/23]
Running step: step_streamline [3/23]


                i.e. domain=finn to domain=qonnx.custom_op.<general|fpgadataflow|...>


Running step: step_convert_to_hls [4/23]
Running step: step_create_dataflow_partition [5/23]
Running step: step_target_fps_parallelization [6/23]
Running step: step_apply_folding_config [7/23]
Running step: step_minimize_bit_width [8/23]
Running step: step_generate_estimate_reports [9/23]
Running step: step_hls_codegen [10/23]
Running step: step_hls_ipgen [11/23]
Running step: step_measure_nodebynode_rtlsim_performance [12/23]
Running step: step_set_fifo_depths [13/23]
Running step: custom_step_insert_tlastmarker [14/23]
Running step: step_create_stitched_ip [15/23]
Running step: step_measure_rtlsim_performance [16/23]
Running step: step_out_of_context_synthesis [17/23]
Running step: step_synthesize_bitfile [18/23]
Running step: step_make_pynq_driver [19/23]
Running step: step_deployment_package [20/23]
Running step: test_step_gen_vitis_xo [21/23]
Running step: test_step_gen_instrumentation_wrapper [22/23]
Running step: custom_step_gen_instrwrap_sim [23/23]
Completed successfully


0

The build outputs, including the intermediate models and estimate reports, can be found in the `output_tfc_w1a1_<PLATFORM_NAME>` folder.

The generated instrumentation wrapper testbench can be found in the `instrwrap_sim` folder. The testbench can be run by sourcing `make_instr_wrap_sim_proj.tcl` through Vivado. \
\
Vivado will write a large amount of output as it runs the simulation. This output is largely information about building and compiling the components within the project, which is unnecessary to see for the most part. Therefore, to keep the notebook outputs tidier, the Vivado output has been redirected to `/dev/null` so that it will not clog up the notebook.

In [12]:
%%sh
cd output_tfc_w1a1_VMK180/instrwrap_sim
vivado -mode batch -source make_instrwrap_sim_proj.tcl >/dev/null 2>&1

After the simulation has completed, the outputs will also be written to a log file within the Vivado project folder, which can be viewed to see the result of the simulation.

In [13]:
!cat output_tfc_w1a1_VMK180/instrwrap_sim/instr_sim_proj/instr_sim_proj.sim/sim_1/behav/xsim/simulate.log

Time resolution is 1 ps
Reset complete
[t=2095.00 ns] STATUS_I = 0
[t=2115.00 ns] STATUS_O = 0
[t=2135.00 ns] LATENCY = 0
[t=2155.00 ns] INTERVAL = 0
[t=2175.00 ns] CHECKSUM = 0
[t=12195.00 ns] STATUS_I = 0
[t=12215.00 ns] STATUS_O = 0
[t=12235.00 ns] LATENCY = 332
[t=12255.00 ns] INTERVAL = 401
[t=12275.00 ns] CHECKSUM = a000005
Nonzero checksum detected, stopping simulation
$finish called at time : 12275 ns : File "/scratch/hannayan/finn/notebooks/instrumentation_wrapper/output_tfc_w1a1_VMK180/instrwrap_sim/instrwrap_testbench.sv" Line 155


`STATUS_I` and `STATUS_O` check to ensure that there is no timestamp overflow or underflow, so that there is an equal interval of time between each sample. `INTERVAL` is the number of cycles it takes to read in and process one input. `LATENCY` is the number of cycles taken to process an input and produce an output. `CHECKSUM` is a verification value to check if the model is working as expected, and should be compared with the value obtained from running the instrumentation wrapper on hardware.

## Build the instrumentation wrapper platform
With the FINN model built, the generated output `.xo` files can then be used to build the platform on which the instrumentation wrapper will be run. This will be done through the use of `Makefiles` and corresponding `make` commands. \
\
The editable variables which are used in the build are defined within the top-level `Makefile` in the `thin_platform` folder.

In [15]:
!sed -n 17,37p instr_wrap_platform/Makefile

###############################################
# Variables that may be changed to your needs #
###############################################

# ILA_EN = 0 (Disabled) or 1 (Enabled)
export ILA_EN                     := 0

# BOARD_NAME = vmk180, vck190
export BOARD_NAME                 := vmk180

# For remote hardware server, set the network protocol, hostname and port to connect to
export HW_SERVER_HOST             := xirdcglab52
export HW_SERVER_PORT             := 3121

# Set if design is single- or double-pumped (1 if double-pumped, 0 if single-pumped)
export DOUBLE_PUMPED              := 0

# Set frequencies of the clocks in the platform
export AP_CLK_MHZ                 := 200
export AP_CLK_2X_MHZ              := 400



The target for this build is hardware, and the Vivado ILA (integrated logic analyser) will not be used in this case, though it can be useful for debugging the signals if needed. The board we are targeting is the VMK180, which is connected to a remote machine with a hw_server set up for it. The design will be singled-pumped (only one clock used to drive it), and the clock frequency will be 200MHz (the other clock will not be used in this case, but it is generally double the frequency of the slower clock). \
\
These default values will suffice for the build alone. However, in order to run the instrumentation wrapper we will need to connect to the board via the hw_server, so the hw_server variables should be changed to your own hw_server parameters. This can be done by opening, editing and saving the Makefile from the Jupyter notebook `instr_wrap_platform` folder.

To start with the platform build, first the `.xo` files will be copied to the Vitis IP folder.

In [16]:
%%sh
cp output_tfc_w1a1_VMK180/xo/finn_design.xo instr_wrap_platform/vitis/ip/finn_design/src
cp output_tfc_w1a1_VMK180/xo/instrumentation_wrapper.xo instr_wrap_platform/vitis/ip/instrumentation_wrapper/src

Then, the necessary `make` commands will be run from the root of the platform directory. `make help` can be run to get a brief explanation on what each `make` rule does.

In [17]:
%%sh
cd instr_wrap_platform
make help

Makefile Usage:
  make all
      Command to generate everything for this design

  make version_check
      checks out if the correct tools/versions are enabled

  make vivado_platform
      Builds a Vivado custom base HW platform using Pre-Synth flow
      To run full implementation platform, override using environment variable
      PRE_SYNTH = False

  make vitis_platform
      Builds the Vitis platform
      * Depends on vivado_platform rule to be completed

  make vitis_ip
      Compile RTL and HLS kernels
      * Depends on vitis_platform rule to be completed

  make full_impl
      Extends and links the HW Platform with RTL and HLS kernels using Vitis v++ linker
      Synthesize and Implements the complete design
      * Depends on vitis_ip rule to be completed

  make run_instr_wrap
      Exports the HW platform and programs the HW device through Vivado
      Builds and runs the instrumentation wrapper on the complete platform through Vitis
      Outputs the results to a serial

\
If there were any builds run previously, `make clean` should be run to remove all outputs generated by previous builds, so that a fresh build can be started. This prevents old or partial builds from potentially interfering with the new build.

In [18]:
%%sh
cd instr_wrap_platform
make clean

make clean -C vivado
make[1]: Entering directory '/scratch/hannayan/finn/notebooks/instrumentation_wrapper/instr_wrap_platform/vivado'
rm -rf build
rm -rf .Xil vivado* .crash*
make[1]: Leaving directory '/scratch/hannayan/finn/notebooks/instrumentation_wrapper/instr_wrap_platform/vivado'
make clean_vitis
make[1]: Entering directory '/scratch/hannayan/finn/notebooks/instrumentation_wrapper/instr_wrap_platform'
make clean -C vitis/xpfm_export
make[2]: Entering directory '/scratch/hannayan/finn/notebooks/instrumentation_wrapper/instr_wrap_platform/vitis/xpfm_export'
rm -rf ./build
rm -rf ./.Xil
make[2]: Leaving directory '/scratch/hannayan/finn/notebooks/instrumentation_wrapper/instr_wrap_platform/vitis/xpfm_export'
make clean -C vitis/ip
make[2]: Entering directory '/scratch/hannayan/finn/notebooks/instrumentation_wrapper/instr_wrap_platform/vitis/ip'
make clean -C finn_design
make[3]: Entering directory '/scratch/hannayan/finn/notebooks/instrumentation_wrapper/instr_wrap_platform/vitis/

To build the instrumentation wrapper, the 4 `make` commands \
`make vivado_platform`\
`make vitis_platform`\
`make vitis_ip`\
`make full_impl`\
must be run in succession.

Alternatively, instead of running each command separately, the `make all` command can be used to run the aforementioned 4 steps in succession. \
\
The build will take a few minutes to complete.

In [19]:
%%sh
cd instr_wrap_platform
make all

AMD TOOLS & VERSION CHECK SUCCESSFUL
make platform_classic -C vivado
make[1]: Entering directory '/scratch/hannayan/finn/notebooks/instrumentation_wrapper/instr_wrap_platform/vivado'
vivado -mode batch -source xsa_platform_classic.tcl -tclargs vmk180 vmk180_thin xcvm1802-vsva2197-2MP-e-S true vmk180 3.1 200 400 2022.2

****** Vivado v2022.2 (64-bit)
  **** SW Build 3671981 on Fri Oct 14 04:59:54 MDT 2022
  **** IP Build 3669848 on Fri Oct 14 08:30:02 MDT 2022
    ** Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.

Sourcing tcl script '/tmp/home_dir/.Xilinx/Vivado/Vivado_init.tcl'
484 Beta devices matching pattern found, 484 enabled.
enable_beta_device: Time (s): cpu = 00:00:10 ; elapsed = 00:00:37 . Memory (MB): peak = 2045.352 ; gain = 132.512 ; free physical = 45875 ; free virtual = 501647
source xsa_platform_classic.tcl
# namespace eval _tcl {
#   proc get_script_folder {} {
#     set script_path [file normalize [info script]]
#     set script_folder [file dirname $script_path

   26 | {
      | ^


Running Make libs in psv_cortexa72_0/libsrc/standalone_v8_0/src

make -C psv_cortexa72_0/libsrc/standalone_v8_0/src -s libs  "SHELL=/bin/sh" "COMPILER=aarch64-none-elf-gcc" "ASSEMBLER=aarch64-
none-elf-as" "ARCHIVER=aarch64-none-elf-ar" "COMPILER_FLAGS=  -O2 -c" "EXTRA_COMPILER_FLAGS=-g -Wall -Wextra -Dversal -DARMA72_
EL3 -fno-tree-loop-distribute-patterns"

Running Make libs in psv_cortexa72_0/libsrc/sysmonpsv_v3_1/src

make -C psv_cortexa72_0/libsrc/sysmonpsv_v3_1/src -s libs  "SHELL=/bin/sh" "COMPILER=aarch64-none-elf-gcc" "ASSEMBLER=aarch64-n
one-elf-as" "ARCHIVER=aarch64-none-elf-ar" "COMPILER_FLAGS=  -O2 -c" "EXTRA_COMPILER_FLAGS=-g -Wall -Wextra -Dversal -DARMA72_E
L3 -fno-tree-loop-distribute-patterns"

Running Make libs in psv_cortexa72_0/libsrc/trngpsv_v1_2/src

make -C psv_cortexa72_0/libsrc/trngpsv_v1_2/src -s libs  "SHELL=/bin/sh" "COMPILER=aarch64-none-elf-gcc" "ASSEMBLER=aarch64-non
e-elf-as" "ARCHIVER=aarch64-none-elf-ar" "COMPILER_FLAGS=  -O2 -c" "EXTRA_COMPILER_FLAGS

      |  ^~~~~~~
      |  ^~~~~~~
      |  ^~~~~~~
      |  ^~~~~~~
      |  ^~~~~~~
      |  ^~~~~~~
      |  ^~~~~~~
xtime_l.c:49:9: note: '#pragma message: For the sleep routines, Global timer is being used'
   49 | #pragma message ("For the sleep routines, Global timer is being used")
      |         ^~~~~~~


Finished building libraries parallelly.

make --no-print-directory archive

aarch64-none-elf-ar -r  psv_cortexa72_0/lib/libxil.a psv_cortexa72_0/lib/CompactAES.o psv_cortexa72_0/lib/_exit.o psv_cortexa72
_0/lib/_open.o psv_cortexa72_0/lib/_sbrk.o psv_cortexa72_0/lib/abort.o psv_cortexa72_0/lib/asm_vectors.o psv_cortexa72_0/lib/bo
ot.o psv_cortexa72_0/lib/close.o psv_cortexa72_0/lib/cpputest_time.o psv_cortexa72_0/lib/errno.o psv_cortexa72_0/lib/fcntl.o ps
v_cortexa72_0/lib/fstat.o psv_cortexa72_0/lib/getpid.o psv_cortexa72_0/lib/inbyte.o psv_cortexa72_0/lib/initialise_monitor_hand
les.o psv_cortexa72_0/lib/isatty.o psv_cortexa72_0/lib/kill.o psv_cortexa72_0/lib/lseek.o psv_cortexa72_0/lib/open.o psv_cortex
a72_0/lib/outbyte.o psv_cortexa72_0/lib/print.o psv_cortexa72_0/lib/putnum.o psv_cortexa72_0/lib/read.o psv_cortexa72_0/lib/sbr
k.o psv_cortexa72_0/lib/sleep.o psv_cortexa72_0/lib/time.o psv_cortexa72_0/lib/translation_table.o psv_cortexa72_0/lib/unlink.o
 psv_cortexa72_0/lib/vectors

/proj/xbuilds/SWIP/2022.2_1014_8888/installs/lin64/Vitis/2022.2/gnu/aarch64/lin/aarch64-none/bin/../x86_64-oesdk-linux/usr/bin/aarch64-xilinx-elf/aarch64-xilinx-elf-ar.real: creating psv_cortexa72_0/lib/libxil.a


Finished building libraries

make[2]: Leaving directory '/scratch/hannayan/finn/notebooks/instrumentation_wrapper/instr_wrap_platform/vitis/xpfm_export/buil
d/platform/vmk180_thin/psv_cortexa72_0/standalone_domain/bsp'

make[1]: Leaving directory '/scratch/hannayan/finn/notebooks/instrumentation_wrapper/instr_wrap_platform/vitis/xpfm_export'
make all -C vitis/ip
make[1]: Entering directory '/scratch/hannayan/finn/notebooks/instrumentation_wrapper/instr_wrap_platform/vitis/ip'
mkdir    xo_hw
make all -C finn_design
make[2]: Entering directory '/scratch/hannayan/finn/notebooks/instrumentation_wrapper/instr_wrap_platform/vitis/ip/finn_design'
cp ./src/finn_design.xo ../xo_hw/finn_design.xo
make[2]: Leaving directory '/scratch/hannayan/finn/notebooks/instrumentation_wrapper/instr_wrap_platform/vitis/ip/finn_design'
make all -C instrumentation_wrapper
make[2]: Entering directory '/scratch/hannayan/finn/notebooks/instrumentation_wrapper/instr_wrap_platform/vitis/ip/instrumentation_wrapper'
c

Once the build has finished, the generated outputs can be found in the `instr_wrap_platform/vitis/build_hw` folder. The full Vivado project `prj.xpr` can be found in `build_hw/_x/link/vivado/vpl/prj`. The final platform block design can be viewed in this project. Vivado features such as utilisation reports can also be run to view the resource usage and other metrics of the platform.

## Alternative Method: Build the instrumentation wrapper as part of the FINN build flow
Alternatively, instead of building the platform separately to the FINN model, additional steps could be appended to the FINN build flow to build the platform after the FINN model has finished compiling and the corresponding `.xo` files have been generated. These steps would simply call the necessary `make` commands through the Python `subprocess` module. This way, both the FINN model and the instrumentation wrapper platform could be built in one go, rather than having to run the builds separately. The cell below shows how this could be done, and can be run by uncommenting the code by removing the `"""` at the start and the `""";` at the end.

In [None]:
"""
# An alternative method of building the instrumentation wrapper platform
# by appending steps to call the `make` commands to the FINN build flow

import numpy as np
import os
import shutil
import subprocess
from qonnx.custom_op.registry import getCustomOp

import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg
import finn.util.data_packing as dpk
from finn.custom_op.fpgadataflow.templates import ipgentcl_template
from finn.transformation.fpgadataflow.vitis_build import CreateVitisXO
from finn.util.hls import CallHLS

model_file = "model.onnx"
model_name = "tfc_w1a1"

platform_name = "VMK180"
fpga_part = "xcvm1802-vsva2197-2MP-e-S"

vitis_ip_dir = "instr_wrap_platform/vitis/ip"


def test_step_gen_vitis_xo(model, cfg):
    xo_dir = cfg.output_dir + "/xo"
    xo_dir = str(os.path.abspath(xo_dir))
    os.makedirs(xo_dir, exist_ok=True)
    model = model.transform(CreateVitisXO())
    xo_path = model.get_metadata_prop("vitis_xo")
    shutil.copy(xo_path, xo_dir)
    return model

def test_step_gen_instrumentation_wrapper(model, cfg):
    xo_dir = cfg.output_dir + "/xo"
    xo_dir = str(os.path.abspath(xo_dir))
    os.makedirs(xo_dir, exist_ok=True)
    wrapper_output_dir = cfg.output_dir + "/instrumentation_wrapper"
    wrapper_output_dir = str(os.path.abspath(wrapper_output_dir))
    os.makedirs(wrapper_output_dir, exist_ok=True)
    # conservative max for pending feature maps: number of layers
    pending = len(model.graph.node)
    # query the parallelism-dependent folded input shape from the
    # node consuming the graph input
    inp_name = model.graph.input[0].name
    inp_node = getCustomOp(model.find_consumer(inp_name))
    inp_shape_folded = list(inp_node.get_folded_input_shape())
    inp_stream_width = inp_node.get_instream_width_padded()
    # number of beats per input is given by product of folded input
    # shape except the last dim (which is the stream width)
    ilen = np.prod(inp_shape_folded[:-1])
    ti = "ap_uint<%d>" % inp_stream_width
    # perform the same for the output
    out_name = model.graph.output[0].name
    out_node = getCustomOp(model.find_producer(out_name))
    out_shape_folded = list(out_node.get_folded_output_shape())
    out_stream_width = out_node.get_outstream_width_padded()
    olen = np.prod(out_shape_folded[:-1])
    to = "ap_uint<%d>" % out_stream_width
    ko = out_shape_folded[-1]
    # fill out instrumentation wrapper template
    with open("templates/instrumentation_wrapper.template.cpp", "r") as f:
        instrwrp_cpp = f.read()
    instrwrp_cpp = instrwrp_cpp.replace("@PENDING@", str(pending))
    instrwrp_cpp = instrwrp_cpp.replace("@ILEN@", str(ilen))
    instrwrp_cpp = instrwrp_cpp.replace("@OLEN@", str(olen))
    instrwrp_cpp = instrwrp_cpp.replace("@TI@", str(ti))
    instrwrp_cpp = instrwrp_cpp.replace("@TO@", str(to))
    instrwrp_cpp = instrwrp_cpp.replace("@KO@", str(ko))
    with open(wrapper_output_dir + "/top_instrumentation_wrapper.cpp", "w") as f:
        f.write(instrwrp_cpp)
    # fill out HLS synthesis tcl template
    prjname = "project_instrwrap"
    ipgentcl = ipgentcl_template
    ipgentcl = ipgentcl.replace("$PROJECTNAME$", prjname)
    ipgentcl = ipgentcl.replace("$HWSRCDIR$", wrapper_output_dir)
    ipgentcl = ipgentcl.replace("$TOPFXN$", "instrumentation_wrapper")
    ipgentcl = ipgentcl.replace("$FPGAPART$", cfg._resolve_fpga_part())
    ipgentcl = ipgentcl.replace("$CLKPERIOD$", str(cfg.synth_clk_period_ns))
    ipgentcl = ipgentcl.replace("$DEFAULT_DIRECTIVES$", "")
    ipgentcl = ipgentcl.replace("$EXTRA_DIRECTIVES$", "config_export -format xo")
    # use Vitis RTL kernel (.xo) output instead of IP-XACT
    ipgentcl = ipgentcl.replace("export_design -format ip_catalog", "export_design -format xo")
    with open(wrapper_output_dir + "/hls_syn.tcl", "w") as f:
        f.write(ipgentcl)
    # build bash script to launch HLS synth and call it
    code_gen_dir = wrapper_output_dir
    builder = CallHLS()
    builder.append_tcl(code_gen_dir + "/hls_syn.tcl")
    builder.set_ipgen_path(code_gen_dir + "/{}".format(prjname))
    builder.build(code_gen_dir)
    ipgen_path = builder.ipgen_path
    assert os.path.isdir(ipgen_path), "HLS IPGen failed: %s not found" % (ipgen_path)
    ip_path = ipgen_path + "/sol1/impl/ip"
    assert os.path.isdir(ip_path), "HLS IPGen failed: %s not found. Check log under %s" % (
        ip_path,
        code_gen_dir,
    )
    xo_path = code_gen_dir + "/{}/sol1/impl/export.xo".format(prjname)
    xo_instr_path = xo_dir + "/instrumentation_wrapper.xo"
    shutil.copy(xo_path, xo_instr_path)

    return model

def test_step_insert_tlastmarker(model, cfg):
    model = model.transform(InsertTLastMarker(
        # only insert marker on output (input TLAST is ignored for these use-cases)
        both=False,
        # use ap_axiu instead of qdma_axis
        external=False,
        # static number of iterations (based on what the compiler/folding sets up)
        dynamic=False
    ))
    # give a proper name to the inserted node, important for codegen
    model.graph.node[-1].name = "TLastMarker_0"
    # re-run codegen and HLS IP gen, will affect only the new TLastMarker layer assuming
    # all other IPs have been generated already
    model = model.transform(PrepareIP(cfg._resolve_fpga_part(), cfg._resolve_hls_clk_period()))
    model = model.transform(HLSSynthIP())
    
    return model

def test_step_gen_instrwrap_sim(model, cfg):
    sim_output_dir = cfg.output_dir + "/instrwrap_sim"
    os.makedirs(sim_output_dir, exist_ok=True)
    # fill in testbench template
    with open("templates/instrwrap_testbench.template.sv", "r") as f:
        testbench_sv = f.read()
    with open(sim_output_dir + "/instrwrap_testbench.sv", "w") as f:
        f.write(testbench_sv)
    # fill in testbench project creator template
    with open("templates/make_instrwrap_sim_proj.template.tcl", "r") as f:
        testbench_tcl = f.read()
    testbench_tcl = testbench_tcl.replace("@FPGA_PART@", cfg.fpga_part)
    with open(sim_output_dir + "/make_instrwrap_sim_proj.tcl", "w") as f:
        f.write(testbench_tcl)

    return model


# Steps for exporting the .xo files and running the make commands
# to build the platform, using the subprocess module
def test_step_export_xo(model, cfg):
    # Copy the generated .xo files to their respective Vitis IP directory
    result = subprocess.call(['cp', cfg.output_dir+"/xo/finn_design.xo", 'instr_wrap_platform/vitis/ip/finn_design/src'])
    result = subprocess.call(['cp', cfg.output_dir+"/xo/instrumentation_wrapper.xo", 'instr_wrap_platform/vitis/ip/instrumentation_wrapper/src'])
    return model

def test_step_build_platform(model, cfg):
    # Clean any previous/partial builds and then build full platform
    result = subprocess.call("cd instr_wrap_platform && make clean && make all", shell=True)
    return model


# Append the steps needed to build the platform
build_steps = build_cfg.default_build_dataflow_steps + [
    test_step_gen_vitis_xo,
    test_step_gen_instrumentation_wrapper,
    test_step_gen_instrwrap_sim,
    test_step_export_xo,
    test_step_build_platform,
]

# insert tlast marker before stitched ip
step_stitchedip_ind = build_steps.index("step_create_stitched_ip")
build_steps.insert(step_stitchedip_ind, test_step_insert_tlastmarker)
build_steps.remove("step_specialize_to_rtl")

cfg = build.DataflowBuildConfig(
    steps=build_steps,
    board=platform_name,
    fpga_part=fpga_part,
    output_dir="output_%s_%s" % (model_name, platform_name),
    synth_clk_period_ns=3.3,
    folding_config_file="folding_config.json",
    stitched_ip_gen_dcp=False,
    generate_outputs=[
        build_cfg.DataflowOutputType.STITCHED_IP,
    ],
    save_intermediate_models=True,
)
model_file = "model.onnx"
build.build_dataflow_cfg(model_file, cfg)
""";

## Next steps
Once all the cells have finished running, the necessary builds will have been completed. **Due to a bug with Vitis XSCT tools, the instrumentation wrapper cannot be run from the notebook. It must be run from outside the notebook (e.g. from the command line that the Jupyter notebook server was started from).** \
\
The next notebook (`2-run_instr_wrap.ipynb`) will detail the process through which the instrumentation wrapper is run from the command line. However, the code cells will not function as intended, and are only placed to show the commands needed to run the instrumentation wrapper.