# FINN - Functional Verification of End-to-End Flow
-----------------------------------------------------------------

**Important: This notebook depends on the tfc_end2end_example notebook, because we are using models that are available at intermediate steps in the end-to-end flow. So please make sure the needed .onnx files are generated to run this notebook.**

In this notebook, we will show how to take the intermediate results of the end-to-end tfc example and verify their functionality with different methods. In the following picture you can see the section in the end-to-end flow about the *Simulation & Emulation Flows*. Besides the methods in this notebook, there is another one that is covered in the Jupyter notebook [tfc_end2end_example](tfc_end2end_example.ipynb): remote execution. The remote execution allows functional verification directly on the PYNQ board, for details please have a look at the mentioned Jupyter notebook.

<img src="verification.png" alt="Drawing" style="width: 500px;"/>

We will use the following helper functions, `showSrc` to show source code of FINN library calls and `showInNetron` to show the ONNX model at the current transformation step. The Netron displays are interactive, but they only work when running the notebook actively and not on GitHub (i.e. if you are viewing this on GitHub you'll only see blank squares).

In [1]:
%load_ext autoreload
%autoreload 2
from finn.util.basic import make_build_dir
from finn.util.visualization import showSrc, showInNetron
import os

build_dir = os.environ["FINN_BUILD_DIR"]

To verify the simulations, a "golden" output is calculated as a reference. This is calculated directly from the Brevitas model using PyTorch, by running some example data from the MNIST dataset through the trained model.

In [2]:
from pkgutil import get_data
import onnx
import onnx.numpy_helper as nph
import torch
from finn.util.test import get_test_model_trained

fc = get_test_model_trained("TFC", 1, 1)
raw_i = get_data("qonnx.data", "onnx/mnist-conv/test_data_set_0/input_0.pb")
input_tensor = onnx.load_tensor_from_string(raw_i)
input_brevitas = torch.from_numpy(nph.to_array(input_tensor).copy()).float()
output_golden = fc.forward(input_brevitas).detach().numpy()
output_golden

array([[-1.4090618, -1.3267527,  0.9779036, -1.2444434, -1.4090618,
        -1.6559892, -1.3267527, -1.4913709, -1.3267527, -1.6559892]],
      dtype=float32)

## Simulation using Python <a id='simpy'></a>

If an ONNX model consists of [standard ONNX](https://github.com/onnx/onnx/blob/main/docs/Operators.md) nodes and/or FINN custom operations that do not belong to the fpgadataflow (`backend` $\neq$ `fpgadataflow`) this model can be checked for functionality using Python.

To simulate a standard ONNX node [onnxruntime](https://github.com/microsoft/onnxruntime) is used. onnxruntime is an open source tool developed by Microsoft to run standard ONNX nodes. For the FINN custom op nodes execution, functions are defined. The following is an example of the execution function of a XNOR popcount node.


In [3]:
from qonnx.custom_op.general.xnorpopcount import xnorpopcountmatmul
showSrc(xnorpopcountmatmul)

def xnorpopcountmatmul(inp0, inp1):
    """Simulates XNOR-popcount matrix multiplication as a regular bipolar
    matrix multiplication followed by some post processing."""
    # extract the operand shapes
    # (M, K0) = inp0.shape
    # (K1, N) = inp1.shape
    K0 = inp0.shape[-1]
    K1 = inp1.shape[0]
    # make sure shapes are compatible with matmul
    assert K0 == K1, "Matrix shapes are not compatible with matmul."
    K = K0
    # convert binary inputs to bipolar
    inp0_bipolar = 2.0 * inp0 - 1.0
    inp1_bipolar = 2.0 * inp1 - 1.0
    # call regular numpy matrix multiplication
    out = np.matmul(inp0_bipolar, inp1_bipolar)
    # XNOR-popcount does not produce the regular dot product result --
    # it returns the number of +1s after XNOR. let P be the number of +1s
    # and N be the number of -1s. XNOR-popcount returns P, whereas the
    # regular dot product result from numpy is P-N, so we need to apply
    # some correction.
    # out = P-N
    # K = P+N
    # out + K = 

The function contains a description of the behaviour in Python and can thus calculate the result of the node.

This execution function and onnxruntime is used when `execute_onnx` from `onnx_exec` is applied to the model. The model is then simulated node by node and the result is stored in a context dictionary, which contains the values of each tensor at the end of the execution. To get the result, only the output tensor has to be extracted.

The procedure is shown below. We take the model right before the nodes should be converted into HLS layers and generate an input tensor to pass to the execution function. The input tensor is generated from the Brevitas example inputs.

In [4]:
import numpy as np
from qonnx.core.modelwrapper import ModelWrapper
input_dict = {"global_in": nph.to_array(input_tensor)}

model_for_sim = ModelWrapper(os.environ["FINN_BUILD_DIR"]+"/tfc_w1a1_ready_for_hls_conversion.onnx")

In [5]:
import finn.core.onnx_exec as oxe
output_dict = oxe.execute_onnx(model_for_sim, input_dict, return_full_exec_context=False)
output_pysim = output_dict[list(output_dict.keys())[0]]

try:
    assert np.isclose(output_pysim, np.where(output_golden[0]==np.amax(output_golden[0])), atol=1e-3).all()
    print("Results are the same!")
except AssertionError:
    assert False, "The results are not the same!"

Results are the same!


The result is compared with the theoretical "golden" value for verification.

## Simulation (cppsim) using C++

When dealing with HLS custom op nodes in FINN the simulation using Python is no longer sufficient. After the nodes have been converted to HLS layers, the simulation using C++ can be used. To do this, the input tensor is stored in a .npy file and C++ code is generated that reads the values from the .npy array, streams them to the corresponding finn-hlslib function and writes the result to a new .npy file. This in turn can be read in Python and processed in the FINN flow. For this example the model after setting the folding factors in the HLS layers is used, please be aware that this is not the full model, but the dataflow partition, so before executing at the end of this section we have to integrate the model back into the parent model.

In [6]:
parent_model = ModelWrapper(build_dir+"/tfc_w1_a1_with_accl.onnx")

To generate the code for this simulation and to generate the executable two transformations are used:
* `PrepareCppSim` which generates the C++ code for the corresponding hls layer
* `CompileCppSim` which compules the C++ code and stores the path to the executable

In [7]:
from finn.transformation.fpgadataflow.prepare_cppsim import PrepareCppSim
from finn.transformation.fpgadataflow.compile_cppsim import CompileCppSim
from finn.transformation.fpgadataflow.set_exec_mode import SetExecMode

from qonnx.transformation.general import GiveUniqueNodeNames

from qonnx.custom_op.registry import getCustomOp

sdp_nodes = parent_model.get_nodes_by_op_type("StreamingDataflowPartition")
models = []

for sdp_node in sdp_nodes:
    sdp_node = getCustomOp(sdp_node)
    dataflow_model_filename = sdp_node.get_nodeattr("model")
    models.append(ModelWrapper(dataflow_model_filename))

for i, (model, sdp_node) in enumerate(zip(models, sdp_nodes)): 
    model = model.transform(GiveUniqueNodeNames())
    model = model.transform(PrepareCppSim())
    model = model.transform(CompileCppSim())
    model = model.transform(SetExecMode("cppsim"))

    model_filename = build_dir+f"/tfc_w1_a1_for_cppsim_{i}.onnx"
    model.save(model_filename)
    getCustomOp(sdp_node).set_nodeattr("model", model_filename)


model_filename = build_dir+f"/tfc_w1_a1_for_cppsim.onnx"
parent_model.save(model_filename)

-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) 
-- Found Sphinx: /opt/conda/bin/sphinx-build  
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/finn_dev_streichg/code_gen_cppsim_AcclOut_0_07j462su


[0mUsing Xilinx HLS headers[0m
[0mDoxygen needs to be installed to generate the doxygen documentation[0m
[0mtest[0m


[  4%] [32mBuilding CXX object CMakeFiles/cclobfm.dir/cclo_bfm.cpp.o[0m


In file included from /home/streichg/finn/ACCL/test/model/bfm/cclo_bfm.h:20,
                 from /home/streichg/finn/ACCL/test/model/bfm/cclo_bfm.cpp:18:
/home/streichg/finn/ACCL/test/model/bfm/../../../driver/hls/accl_hls.h: In member function ‘void accl_hls::ACCLCommand::start_call(ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<64>, ap_uint<64>, ap_uint<64>)’:
  150 |             io_section:{
      |             ^~~~~~~~~~
/home/streichg/finn/ACCL/test/model/bfm/../../../driver/hls/accl_hls.h: In constructor ‘accl_hls::ACCLData::ACCLData(hlslib::Stream<hls::axis<ap_uint<512>, 0, 0, 8> >&, hlslib::Stream<hls::axis<ap_uint<512>, 0, 0, 8> >&)’:
  515 |         STREAM<stream_word> &cclo2krnl;
      |                              ^~~~~~~~~
  514 |         STREAM<stream_word> &krnl2cclo;
      |                              ^~~~~~~~~
  510 |         ACCLData(STREAM<stream_word> &krnl2cclo, STREAM<stream_word> &

[  9%] [32m[1mLinking CXX shared library lib/libcclobfm.so[0m
[  9%] Built target cclobfm
[ 14%] [32mBuilding CXX object CMakeFiles/accl_network_utils/vnx/CMakeFiles/vnx.dir/src/cmac.cpp.o[0m
[ 19%] [32mBuilding CXX object CMakeFiles/accl_network_utils/vnx/CMakeFiles/vnx.dir/src/networklayer.cpp.o[0m
[ 23%] [32mBuilding CXX object CMakeFiles/accl_network_utils/vnx/CMakeFiles/vnx.dir/src/mac.cpp.o[0m
[ 23%] Built target vnx
[ 28%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/src/accl.cpp.o[0m


/home/streichg/finn/ACCL/driver/xrt/src/accl.cpp: In member function ‘ACCL::communicatorId ACCL::ACCL::create_communicator(const std::vector<ACCL::rank_t>&, int)’:
 1064 |   communicatorId new_comm_id = communicators.size() - 1;
      |                  ^~~~~~~~~~~


[ 33%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/src/common.cpp.o[0m
[ 38%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/src/communicator.cpp.o[0m
[ 42%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/src/constants.cpp.o[0m
[ 47%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/src/simdevice.cpp.o[0m
[ 52%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/src/simbuffer.cpp.o[0m
[ 57%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/src/fpgadevice.cpp.o[0m
[ 61%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/home/streichg/finn/ACCL/test/model/zmq/zmq_client.cpp.o[0m
[ 66%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/home/streichg/finn/ACCL/test/model/zmq/zmq_common.cpp.o[0m
[ 71%] [32m[1mLinking CXX shared library ../.

/home/streichg/finn/ACCL/test/hardware/HiveNet/network/roce_v2/xrt_utils/src/hivenet.cpp: In member function ‘std::string roce::Hivenet::get_ip_subnet()’:
  106 |   uint32_t ip_subnet = hivenet.read_register(IPsubnet_off);
      |            ^~~~~~~~~


[ 80%] Built target network_roce_v2
[ 85%] [32mBuilding CXX object CMakeFiles/accl_network_utils/CMakeFiles/accl_network_utils.dir/src/accl_network_utils.cpp.o[0m


/home/streichg/finn/ACCL/driver/utils/accl_network_utils/src/accl_network_utils.cpp: In function ‘std::unique_ptr<ACCL::ACCL> accl_network_utils::initialize_accl(const std::vector<ACCL::rank_t>&, int, bool, accl_network_utils::acclDesign, xrt::device, std::filesystem::__cxx11::path, int, ACCL::addr_t, ACCL::addr_t, bool)’:
  332 |   std::size_t world_size = ranks.size();
      |               ^~~~~~~~~~


[ 85%] Built target accl_network_utils
[ 90%] [32mBuilding CXX object CMakeFiles/node_model.dir/execute_AcclOut.cpp.o[0m


In file included from /tmp/finn_dev_streichg/code_gen_cppsim_AcclOut_0_07j462su/execute_AcclOut.cpp:9:
/home/streichg/finn/ACCL/test/model/bfm/../../../driver/hls/accl_hls.h: In member function ‘void accl_hls::ACCLCommand::start_call(ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<64>, ap_uint<64>, ap_uint<64>)’:
  150 |             io_section:{
      |             ^~~~~~~~~~
/home/streichg/finn/ACCL/test/model/bfm/../../../driver/hls/accl_hls.h: In constructor ‘accl_hls::ACCLData::ACCLData(hlslib::Stream<hls::axis<ap_uint<512>, 0, 0, 8> >&, hlslib::Stream<hls::axis<ap_uint<512>, 0, 0, 8> >&)’:
  515 |         STREAM<stream_word> &cclo2krnl;
      |                              ^~~~~~~~~
  514 |         STREAM<stream_word> &krnl2cclo;
      |                              ^~~~~~~~~
  510 |         ACCLData(STREAM<stream_word> &krnl2cclo, STREAM<stream_word> &cclo2krnl) :
      |         ^~~~~~~~
In file include

[ 95%] [32mBuilding CXX object CMakeFiles/node_model.dir/home/streichg/finn/deps/cnpy/cnpy.cpp.o[0m


/home/streichg/finn/deps/cnpy/cnpy.cpp: In function ‘void cnpy::parse_npy_header(unsigned char*, size_t&, std::vector<long unsigned int>&, bool&)’:
   64 |     uint8_t major_version = *reinterpret_cast<uint8_t*>(buffer+6);
      |             ^~~~~~~~~~~~~
   65 |     uint8_t minor_version = *reinterpret_cast<uint8_t*>(buffer+7);
      |             ^~~~~~~~~~~~~
/home/streichg/finn/deps/cnpy/cnpy.cpp: In function ‘cnpy::NpyArray load_the_npz_array(FILE*, uint32_t, uint32_t)’:
  199 |     int err;
      |         ^~~


[100%] [32m[1mLinking CXX executable bin/node_model[0m
[100%] Built target node_model
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) 
-- Found Sphinx: /opt/conda/bin/sphinx-build  
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/finn_dev_streichg/code_gen_cppsim_AcclIn_0_f0x7ijz5
[  4%] [32mBuilding CXX object CMakeFiles/cclobfm.dir/cclo_bfm.cpp.o[0m


[0mUsing Xilinx HLS headers[0m
[0mDoxygen needs to be installed to generate the doxygen documentation[0m
[0mtest[0m
In file included from /home/streichg/finn/ACCL/test/model/bfm/cclo_bfm.h:20,
                 from /home/streichg/finn/ACCL/test/model/bfm/cclo_bfm.cpp:18:
/home/streichg/finn/ACCL/test/model/bfm/../../../driver/hls/accl_hls.h: In member function ‘void accl_hls::ACCLCommand::start_call(ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<64>, ap_uint<64>, ap_uint<64>)’:
  150 |             io_section:{
      |             ^~~~~~~~~~
/home/streichg/finn/ACCL/test/model/bfm/../../../driver/hls/accl_hls.h: In constructor ‘accl_hls::ACCLData::ACCLData(hlslib::Stream<hls::axis<ap_uint<512>, 0, 0, 8> >&, hlslib::Stream<hls::axis<ap_uint<512>, 0, 0, 8> >&)’:
  515 |         STREAM<stream_word> &cclo2krnl;
      |                              ^~~~~~~~~
  514 |         STREAM<stream_word> &krnl2cclo;
   

[  9%] [32m[1mLinking CXX shared library lib/libcclobfm.so[0m
[  9%] Built target cclobfm
[ 14%] [32mBuilding CXX object CMakeFiles/accl_network_utils/vnx/CMakeFiles/vnx.dir/src/cmac.cpp.o[0m
[ 19%] [32mBuilding CXX object CMakeFiles/accl_network_utils/vnx/CMakeFiles/vnx.dir/src/networklayer.cpp.o[0m
[ 23%] [32mBuilding CXX object CMakeFiles/accl_network_utils/vnx/CMakeFiles/vnx.dir/src/mac.cpp.o[0m
[ 23%] Built target vnx
[ 28%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/src/accl.cpp.o[0m


/home/streichg/finn/ACCL/driver/xrt/src/accl.cpp: In member function ‘ACCL::communicatorId ACCL::ACCL::create_communicator(const std::vector<ACCL::rank_t>&, int)’:
 1064 |   communicatorId new_comm_id = communicators.size() - 1;
      |                  ^~~~~~~~~~~


[ 33%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/src/common.cpp.o[0m
[ 38%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/src/communicator.cpp.o[0m
[ 42%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/src/constants.cpp.o[0m
[ 47%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/src/simdevice.cpp.o[0m
[ 52%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/src/simbuffer.cpp.o[0m
[ 57%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/src/fpgadevice.cpp.o[0m
[ 61%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/home/streichg/finn/ACCL/test/model/zmq/zmq_client.cpp.o[0m
[ 66%] [32mBuilding CXX object CMakeFiles/accl_network_utils/xrt/CMakeFiles/accl.dir/home/streichg/finn/ACCL/test/model/zmq/zmq_common.cpp.o[0m
[ 71%] [32m[1mLinking CXX shared library ../.

/home/streichg/finn/ACCL/test/hardware/HiveNet/network/roce_v2/xrt_utils/src/hivenet.cpp: In member function ‘std::string roce::Hivenet::get_ip_subnet()’:
  106 |   uint32_t ip_subnet = hivenet.read_register(IPsubnet_off);
      |            ^~~~~~~~~


[ 80%] Built target network_roce_v2
[ 85%] [32mBuilding CXX object CMakeFiles/accl_network_utils/CMakeFiles/accl_network_utils.dir/src/accl_network_utils.cpp.o[0m


/home/streichg/finn/ACCL/driver/utils/accl_network_utils/src/accl_network_utils.cpp: In function ‘std::unique_ptr<ACCL::ACCL> accl_network_utils::initialize_accl(const std::vector<ACCL::rank_t>&, int, bool, accl_network_utils::acclDesign, xrt::device, std::filesystem::__cxx11::path, int, ACCL::addr_t, ACCL::addr_t, bool)’:
  332 |   std::size_t world_size = ranks.size();
      |               ^~~~~~~~~~


[ 85%] Built target accl_network_utils
[ 90%] [32mBuilding CXX object CMakeFiles/node_model.dir/execute_AcclIn.cpp.o[0m


In file included from /tmp/finn_dev_streichg/code_gen_cppsim_AcclIn_0_f0x7ijz5/execute_AcclIn.cpp:9:
/home/streichg/finn/ACCL/test/model/bfm/../../../driver/hls/accl_hls.h: In member function ‘void accl_hls::ACCLCommand::start_call(ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<32>, ap_uint<64>, ap_uint<64>, ap_uint<64>)’:
  150 |             io_section:{
      |             ^~~~~~~~~~
/home/streichg/finn/ACCL/test/model/bfm/../../../driver/hls/accl_hls.h: In constructor ‘accl_hls::ACCLData::ACCLData(hlslib::Stream<hls::axis<ap_uint<512>, 0, 0, 8> >&, hlslib::Stream<hls::axis<ap_uint<512>, 0, 0, 8> >&)’:
  515 |         STREAM<stream_word> &cclo2krnl;
      |                              ^~~~~~~~~
  514 |         STREAM<stream_word> &krnl2cclo;
      |                              ^~~~~~~~~
  510 |         ACCLData(STREAM<stream_word> &krnl2cclo, STREAM<stream_word> &cclo2krnl) :
      |         ^~~~~~~~
In file included 

[ 95%] [32mBuilding CXX object CMakeFiles/node_model.dir/home/streichg/finn/deps/cnpy/cnpy.cpp.o[0m


/home/streichg/finn/deps/cnpy/cnpy.cpp: In function ‘void cnpy::parse_npy_header(unsigned char*, size_t&, std::vector<long unsigned int>&, bool&)’:
   64 |     uint8_t major_version = *reinterpret_cast<uint8_t*>(buffer+6);
      |             ^~~~~~~~~~~~~
   65 |     uint8_t minor_version = *reinterpret_cast<uint8_t*>(buffer+7);
      |             ^~~~~~~~~~~~~
/home/streichg/finn/deps/cnpy/cnpy.cpp: In function ‘cnpy::NpyArray load_the_npz_array(FILE*, uint32_t, uint32_t)’:
  199 |     int err;
      |         ^~~


[100%] [32m[1mLinking CXX executable bin/node_model[0m
[100%] Built target node_model


When we take a look at the model using netron, we can see that the transformations introduced new attributes.

In [8]:
showInNetron(build_dir+"/tfc_w1_a1_for_cppsim_1.onnx")

Serving '/tmp/finn_dev_streichg/tfc_w1_a1_for_cppsim_1.onnx' at http://0.0.0.0:8081


The following node attributes have been added:
* `code_gen_dir_cppsim` indicates the directory where the files for the simulation using C++ are stored
* `executable_path` specifies the path to the executable

We take now a closer look into the files that were generated:

Besides the .cpp file, the folder contains .h files with the weights and thresholds. The shell script contains the compile command and *node_model* is the executable generated by compilation. Comparing this with the `executable_path` node attribute, it can be seen that it specifies exactly the path to *node_model*.

To simulate the model the execution mode(exec_mode) must be set to "cppsim". This is done using the transformation SetExecMode.

Before the model can be executed using `execute_onnx`, we integrate the child model in the parent model. The function reads then the `exec_mode` and writes the input into the correct directory in a .npy file. To be able to read this in C++, there is an additional .hpp file ([npy2apintstream.hpp](https://github.com/Xilinx/finn/blob/main/src/finn/qnn-data/cpp/npy2apintstream.hpp)) in FINN, which uses cnpy to read .npy files and convert them into streams, or to read a stream and write it into an .npy. [cnpy](https://github.com/rogersce/cnpy) is a helper to read and write .npy and .npz formates in C++.

The result is again compared to the "golden" output.

In [9]:
output_dict = oxe.execute_onnx(parent_model, input_dict)
output_cppsim = output_dict[list(output_dict.keys())[0]]

try:
    assert np.isclose(output_cppsim, np.where(output_golden[0]==np.amax(output_golden[0])), atol=1e-3).all()
    print("Results are the same!")
except AssertionError:
    assert False, "The results are not the same!"

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/finn_dev_streichg/code_gen_cppsim_AcclOut_0_8bvep1it/bin/node_model'

## Emulation (rtlsim) using PyVerilator

The emulation using [PyVerilator](https://github.com/maltanar/pyverilator) can be done after IP blocks are generated from the corresponding HLS layers. Pyverilator is a tool which makes it possible to simulate verilog files using verilator via a python interface.

We have two ways to use rtlsim, one is to run the model node-by-node as with the simulation methods, but if the model is in the form of the dataflow partition, the part of the graph that consist of only HLS nodes could also be executed as whole.

Because at the point where we want to grab and verify the model, the model is already in split form (parent graph consisting of non-hls layers and child graph consisting only of hls layers) we first have to reference the child graph within the parent graph. This is done using the node attribute `model` for the `StreamingDataflowPartition` node.

First the procedure is shown, if the child graph has ip blocks corresponding to the individual layers, then the procedure is shown, if the child graph already has a stitched IP.

### Emulation of model node-by-node

The child model is loaded and the `exec_mode` for each node is set. To prepare the node-by-node emulation the transformation `PrepareRTLSim` is applied to the child model. With this transformation the emulation files are created for each node and can be used directly when calling `execute_onnx()`. Each node has a new node attribute "rtlsim_so" after transformation, which contains the path to the corresponding emulation files. Then it is saved in a new .onnx file so that the changed model can be referenced in the parent model.

In [15]:
from finn.transformation.fpgadataflow.prepare_rtlsim import PrepareRTLSim
from finn.transformation.fpgadataflow.prepare_ip import PrepareIP
from finn.transformation.fpgadataflow.hlssynth_ip import HLSSynthIP

test_fpga_part = "xc7z020clg400-1"
target_clk_ns = 10

child_model = ModelWrapper(build_dir + "/tfc_w1_a1_set_folding_factors.onnx")
child_model = child_model.transform(GiveUniqueNodeNames())
child_model = child_model.transform(PrepareIP(test_fpga_part, target_clk_ns))
child_model = child_model.transform(HLSSynthIP())
child_model = child_model.transform(SetExecMode("rtlsim"))
child_model = child_model.transform(PrepareRTLSim())
child_model.save(build_dir + "/tfc_w1_a1_dataflow_child.onnx")

                                                                                                       : ... In instance LabelSelect_Batch_0.grp_LabelSelect_Batch_0_Pipeline_VITIS_LOOP_488_3_fu_45.flow_control_loop_pipe_sequential_init_U
   54 | #0 ap_loop_init_int = 1'b1;
      |  ^
                  ... Use "/* verilator lint_off STMTDLY */" and lint_on around source to disable this message.
                                                                                                       : ... In instance LabelSelect_Batch_0.grp_LabelSelect_Batch_0_Pipeline_VITIS_LOOP_488_3_fu_45.flow_control_loop_pipe_sequential_init_U
   55 | #0 ap_done_cache = 1'b0;
      |  ^
                                                                                                        : ... In instance LabelSelect_Batch_0.grp_LabelSelect_Batch_0_Pipeline_VITIS_LOOP_488_3_fu_45
  435 | #0 ap_CS_fsm = 1'd1;
      |  ^
                                                                                   

make: Entering directory '/tmp/finn_dev_streichg/pyverilator_LabelSelect_Batch_0_2xfkntn1'
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o pyverilator_wrapper.o /tmp/finn_dev_streichg/pyverilator_LabelSelect_Batch_0_2xfkntn1/pyverilator_wrapper.cpp
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o verilated.o /usr/local/share/verilator/includ

                                                                                                         : ... In instance Thresholding_Batch_0.grp_Thresholding_Batch_fu_134.flow_control_loop_pipe_sequential_init_U
   96 | #0 ap_loop_init_int = 1'b1;
      |  ^
                  ... Use "/* verilator lint_off STMTDLY */" and lint_on around source to disable this message.
                                                                                                         : ... In instance Thresholding_Batch_0.grp_Thresholding_Batch_fu_134.flow_control_loop_pipe_sequential_init_U
   97 | #0 ap_done_cache = 1'b0;
      |  ^
                                                                                                          : ... In instance Thresholding_Batch_0.grp_Thresholding_Batch_fu_134
  830 | #0 ap_CS_iter0_fsm = 1'd1;
      |  ^
                                                                                                          : ... In instance Thresholding_Batch_0.g

make: Entering directory '/tmp/finn_dev_streichg/pyverilator_MatrixVectorActivation_2_p3kp6qxw'
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o pyverilator_wrapper.o /tmp/finn_dev_streichg/pyverilator_MatrixVectorActivation_2_p3kp6qxw/pyverilator_wrapper.cpp
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o verilated.o /usr/local/share/verila

The next step is to load the parent model and set the node attribute `model` in the StreamingDataflowPartition node (`sdp_node`). Afterwards the `exec_mode` is set in the parent model in each node and the model can be executed.

In [None]:
# parent model
model_for_rtlsim = ModelWrapper(build_dir + "/tfc_w1_a1_dataflow_parent.onnx")
# reference child model
sdp_node = getCustomOp(model_for_rtlsim.graph.node[1])
sdp_node.set_nodeattr("model", build_dir + "/tfc_w1_a1_dataflow_child.onnx")

model_for_rtlsim = model_for_rtlsim.transform(SetExecMode("rtlsim"))

In [None]:
output_dict = oxe.execute_onnx(model_for_rtlsim, input_dict)
output_rtlsim = output_dict[list(output_dict.keys())[0]]

try:
    assert np.isclose(output_rtlsim, np.where(output_golden[0]==np.amax(output_golden[0])), atol=1e-3).all()
    print("Results are the same!")
except AssertionError:
    assert False, "The results are not the same!"

### Emulation of stitched IP

Here we use the same procedure. First the child model is loaded, but in contrast to the layer-by-layer emulation, the metadata property `exec_mode` is set to "rtlsim" for the whole child model. When the model is integrated and executed in the last step, the verilog files of the stitched IP of the child model are used.

In [16]:
from finn.transformation.fpgadataflow.insert_dwc import InsertDWC
from finn.transformation.fpgadataflow.insert_fifo import InsertFIFO
from finn.transformation.fpgadataflow.create_stitched_ip import CreateStitchedIP

child_model = ModelWrapper(build_dir + "/tfc_w1_a1_dataflow_child.onnx")
child_model = child_model.transform(InsertDWC())

# set all impl_styles of the DWCs to hls to enable emulation
dwc_nodes = child_model.get_nodes_by_op_type("StreamingDataWidthConverter_Batch")
for dwc in dwc_nodes:
    dwc_inst = getCustomOp(dwc)
    dwc_inst.set_nodeattr("impl_style", "hls")
    
child_model = child_model.transform(InsertFIFO(create_shallow_fifos=True))
child_model.save(build_dir + "/test.onnx");
child_model = child_model.transform(GiveUniqueNodeNames())
child_model = child_model.transform(PrepareIP(test_fpga_part, target_clk_ns))
child_model = child_model.transform(HLSSynthIP())
child_model = child_model.transform(CreateStitchedIP(test_fpga_part, target_clk_ns))
child_model = child_model.transform(PrepareRTLSim())
child_model.set_metadata_prop("exec_mode","rtlsim")
child_model.save(build_dir + "/tfc_w1_a1_dataflow_child.onnx");

                                                                                               : ... In instance StreamingFIFO_8.StreamingFIFO_8_StreamingFIFO_8
  171 |    assign addr_full_ = (state_==state_more) && (addr_==depth-2);
      |                                                      ^~
                ... Use "/* verilator lint_off WIDTH */" and lint_on around source to disable this message.
                                                                                               : ... In instance StreamingFIFO_8.StreamingFIFO_8_StreamingFIFO_8
  228 |   for (a_=depth-2; a_>0; a_=a_-1) begin
      |          ^
                                                                                               : ... In instance StreamingFIFO_8.StreamingFIFO_8_StreamingFIFO_8
  236 |         srlo_       <=  'bx;
      |                     ^~
                                                                                               : ... In instance StreamingFIFO_8.Streamin

make: Entering directory '/tmp/finn_dev_streichg/pyverilator_LabelSelect_Batch_0_mwaz63ss'
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o pyverilator_wrapper.o /tmp/finn_dev_streichg/pyverilator_LabelSelect_Batch_0_mwaz63ss/pyverilator_wrapper.cpp
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o verilated.o /usr/local/share/verilator/includ

                                                                                               : ... In instance StreamingFIFO_6.StreamingFIFO_6_StreamingFIFO_6
  189 |         srlo_       <=  'bx;
      |                         ^~~
                ... Use "/* verilator lint_off WIDTH */" and lint_on around source to disable this message.
                                                                                               : ... In instance StreamingFIFO_6.StreamingFIFO_6_StreamingFIFO_6
  192 |         addr_       <=  'bx;
      |                     ^~
                                                                                               : ... In instance StreamingFIFO_6.StreamingFIFO_6_StreamingFIFO_6
  205 |    srlo_       <= 'bx;
      |                   ^~~
                                                                                               : ... In instance StreamingFIFO_6.StreamingFIFO_6_StreamingFIFO_6
  215 |    srlo_       <= 'bx;
      |        

make: Entering directory '/tmp/finn_dev_streichg/pyverilator_StreamingFIFO_6_0h1rg25o'
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o pyverilator_wrapper.o /tmp/finn_dev_streichg/pyverilator_StreamingFIFO_6_0h1rg25o/pyverilator_wrapper.cpp
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o verilated.o /usr/local/share/verilator/include/verila

                                                                                               : ... In instance StreamingFIFO_4.StreamingFIFO_4_StreamingFIFO_4
  171 |    assign addr_full_ = (state_==state_more) && (addr_==depth-2);
      |                                                      ^~
                ... Use "/* verilator lint_off WIDTH */" and lint_on around source to disable this message.
                                                                                               : ... In instance StreamingFIFO_4.StreamingFIFO_4_StreamingFIFO_4
  228 |   for (a_=depth-2; a_>0; a_=a_-1) begin
      |          ^
                                                                                               : ... In instance StreamingFIFO_4.StreamingFIFO_4_StreamingFIFO_4
  236 |         srlo_       <=  'bx;
      |                     ^~
                                                                                               : ... In instance StreamingFIFO_4.Streamin

make: Entering directory '/tmp/finn_dev_streichg/pyverilator_MatrixVectorActivation_3_une19aju'
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o pyverilator_wrapper.o /tmp/finn_dev_streichg/pyverilator_MatrixVectorActivation_3_une19aju/pyverilator_wrapper.cpp
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o verilated.o /usr/local/share/verila

                                                                                               : ... In instance StreamingFIFO_3.StreamingFIFO_3_StreamingFIFO_3
  124 |    assign addr_full_ = (state_==state_more) && (addr_==depth-2);
      |                                                      ^~
                ... Use "/* verilator lint_off WIDTH */" and lint_on around source to disable this message.
                                                                                               : ... In instance StreamingFIFO_3.StreamingFIFO_3_StreamingFIFO_3
  181 |   for (a_=depth-2; a_>0; a_=a_-1) begin
      |          ^
                                                                                               : ... In instance StreamingFIFO_3.StreamingFIFO_3_StreamingFIFO_3
  189 |         srlo_       <=  'bx;
      |                     ^~
                                                                                               : ... In instance StreamingFIFO_3.Streamin

make: Entering directory '/tmp/finn_dev_streichg/pyverilator_StreamingFIFO_4_fq368wpa'
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o pyverilator_wrapper.o /tmp/finn_dev_streichg/pyverilator_StreamingFIFO_4_fq368wpa/pyverilator_wrapper.cpp
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o verilated.o /usr/local/share/verilator/include/verila

                                                                                               : ... In instance StreamingFIFO_2.StreamingFIFO_2_StreamingFIFO_2
  171 |    assign addr_full_ = (state_==state_more) && (addr_==depth-2);
      |                                                      ^~
                ... Use "/* verilator lint_off WIDTH */" and lint_on around source to disable this message.
                                                                                               : ... In instance StreamingFIFO_2.StreamingFIFO_2_StreamingFIFO_2
  228 |   for (a_=depth-2; a_>0; a_=a_-1) begin
      |          ^
                                                                                               : ... In instance StreamingFIFO_2.StreamingFIFO_2_StreamingFIFO_2
  236 |         srlo_       <=  'bx;
      |                     ^~
                                                                                               : ... In instance StreamingFIFO_2.Streamin

make: Entering directory '/tmp/finn_dev_streichg/pyverilator_StreamingDataWidthConverter_Batch_0_j0prd5fz'
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o pyverilator_wrapper.o /tmp/finn_dev_streichg/pyverilator_StreamingDataWidthConverter_Batch_0_j0prd5fz/pyverilator_wrapper.cpp
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o verilated.o /

                                                                                               : ... In instance StreamingFIFO_1.StreamingFIFO_1_StreamingFIFO_1
  124 |    assign addr_full_ = (state_==state_more) && (addr_==depth-2);
      |                                                      ^~
                ... Use "/* verilator lint_off WIDTH */" and lint_on around source to disable this message.
                                                                                               : ... In instance StreamingFIFO_1.StreamingFIFO_1_StreamingFIFO_1
  181 |   for (a_=depth-2; a_>0; a_=a_-1) begin
      |          ^
                                                                                               : ... In instance StreamingFIFO_1.StreamingFIFO_1_StreamingFIFO_1
  189 |         srlo_       <=  'bx;
      |                         ^~~
                                                                                               : ... In instance StreamingFIFO_1.Str

make: Entering directory '/tmp/finn_dev_streichg/pyverilator_StreamingFIFO_2__dpr05od'
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o pyverilator_wrapper.o /tmp/finn_dev_streichg/pyverilator_StreamingFIFO_2__dpr05od/pyverilator_wrapper.cpp
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o verilated.o /usr/local/share/verilator/include/verila

                                                                                               : ... In instance StreamingFIFO_0.StreamingFIFO_0_StreamingFIFO_0
  171 |    assign addr_full_ = (state_==state_more) && (addr_==depth-2);
      |                                                      ^~
                ... Use "/* verilator lint_off WIDTH */" and lint_on around source to disable this message.
                                                                                               : ... In instance StreamingFIFO_0.StreamingFIFO_0_StreamingFIFO_0
  228 |   for (a_=depth-2; a_>0; a_=a_-1) begin
      |          ^
                                                                                               : ... In instance StreamingFIFO_0.StreamingFIFO_0_StreamingFIFO_0
  236 |         srlo_       <=  'bx;
      |                         ^~~
                                                                                               : ... In instance StreamingFIFO_0.Str

make: Entering directory '/tmp/finn_dev_streichg/pyverilator_StreamingFIFO_1_vyufmcmz'
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o pyverilator_wrapper.o /tmp/finn_dev_streichg/pyverilator_StreamingFIFO_1_vyufmcmz/pyverilator_wrapper.cpp
ccache g++  -I.  -MMD -I/usr/local/share/verilator/include -I/usr/local/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -DVM_TRACE_VCD=1 -faligned-new -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -fPIC --std=c++11  -std=gnu++14 -Os -c -o verilated.o /usr/local/share/verilator/include/verila

In [None]:
showInNetron(build_dir+"/tfc_w1_a1_dataflow_child.onnx")

In [None]:
# parent model
model_for_rtlsim = ModelWrapper(build_dir + "/tfc_w1_a1_dataflow_parent.onnx")
# reference child model
sdp_node = getCustomOp(model_for_rtlsim.graph.node[1])
sdp_node.set_nodeattr("model", build_dir + "/tfc_w1_a1_dataflow_child.onnx")

In [None]:
output_dict = oxe.execute_onnx(model_for_rtlsim, input_dict)
output_rtlsim = output_dict[list(output_dict.keys())[0]]

In [None]:
try:
    assert np.isclose(output_rtlsim, np.where(output_golden[0]==np.amax(output_golden[0])), atol=1e-3).all()
    print("Results are the same!")
except AssertionError:
    assert False, "The results are not the same!"