# End-to-End FINN Flow for a Simple Convolutional Net
-----------------------------------------------------------------

In this notebook, we will go through the FINN steps needed to take a binarized convolutional network all the way down to a heterogeneous streaming dataflow accelerator running on the FPGA. 

It's recommended to go through the simpler [end-to-end notebook for a fully connected network](tfc_end2end_example.ipynb) first, since many steps here are very similar and we will focus on what is done differently for convolutions.

This notebook is quite lengthy, and some of the cells (involving Vivado synthesis) may take up to an hour to finish running. To let you save and resume your progress, we will save the intermediate ONNX models that are generated in the various steps to disk, so that you can jump back directly to where you left off.

## Quick Introduction to the CNV-w1a1 Network

The particular quantized neural network (QNN) we will be targeting in this notebook is referred to as CNV-w1a1 and it classifies 32x32 RGB images into one of ten CIFAR-10 classes. All weights and activations in this network are quantized to bipolar values (either -1 or +1), with the exception of the input (which is RGB with 8 bits per channel) and the final output (which is 32-bit numbers). It first appeared in the original [FINN paper](https://arxiv.org/abs/1612.07119) from ISFPGA'17 with the name CNV, as a variant of the binarized convolutional network from the [BinaryNet paper](https://arxiv.org/abs/1602.02830), in turn inspired by the VGG-11 topology which was the runner-up for the 2014 [ImageNet Large Scale Visual Recognition Challenge](http://www.image-net.org/challenges/LSVRC/).


You'll have a chance to interactively examine the layers that make up the network in Netron in a moment, so that's enough about the network for now. 

## Quick Recap of the End-to-End Flow

The FINN compiler comes with many *transformations* that modify the ONNX representation of the network according to certain patterns. This notebook will demonstrate a *possible* sequence of such transformations to take a particular trained network all the way down to hardware, as shown in the figure below.

![](finn-design-flow-example.svg)

The white fields show the state of the network representation in the respective step. The colored fields represent the transformations that are applied to the network to achieve a certain result. The diagram is divided into 5 sections represented by a different color, each of it includes several flow steps. The flow starts in top left corner with Brevitas export (green section), followed by the preparation of the network (blue section) for the Vitis HLS synthesis and Vivado IPI stitching (orange section), and finally building a PYNQ overlay bitfile and testing it on a PYNQ board (yellow section).
There is an additional section for functional verification (red section) on the left side of the diagram, which we will not cover in this notebook. For details please take a look in the verification notebook which you can find [here](tfc_end2end_verification.ipynb)


We will use the helper function `showInNetron` to show the ONNX model at the current transformation step. The Netron displays are interactive, but they only work when running the notebook actively and not on GitHub (i.e. if you are viewing this on GitHub you'll only see blank squares).

In [1]:
from finn.util.basic import make_build_dir
from finn.util.visualization import showInNetron
import os
    
build_dir = os.environ["FINN_BUILD_DIR"]

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import brevitas.nn as qnn
from brevitas.quant.scaled_int import Int8ActPerTensorFloat
# Copyright (C) 2023, Advanced Micro Devices, Inc. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause

from dependencies import value

from brevitas.core.bit_width import BitWidthImplType
from brevitas.core.quant import QuantType
from brevitas.core.restrict_val import FloatToIntImplType
from brevitas.core.restrict_val import RestrictValueType
from brevitas.core.scaling import ScalingImplType
from brevitas.core.zero_point import ZeroZeroPoint
from brevitas.inject import ExtendedInjector
from brevitas.quant.solver import ActQuantSolver
from brevitas.quant.solver import WeightQuantSolver
from brevitas.quant import Uint8ActPerTensorFloat

class CommonQuant(ExtendedInjector):
    bit_width_impl_type = BitWidthImplType.CONST
    scaling_impl_type = ScalingImplType.CONST
    restrict_scaling_type = RestrictValueType.FP
    zero_point_impl = ZeroZeroPoint
    float_to_int_impl_type = FloatToIntImplType.ROUND
    scaling_per_output_channel = False
    narrow_range = True
    signed = True

    @value
    def quant_type(bit_width):
        if bit_width is None:
            return QuantType.FP
        elif bit_width == 1:
            return QuantType.BINARY
        else:
            return QuantType.INT


class CommonWeightQuant(CommonQuant, WeightQuantSolver):
    scaling_const = 1.0


class CommonUintActQuant(Uint8ActPerTensorFloat):
    """
    Common unsigned act quantizer with bit-width set to None so that it's forced to be specified by
    each layer.
    """
    scaling_min_val = 2e-16
    bit_width = None
    restrict_scaling_type = RestrictValueType.LOG_FP
    
bit_quantization = 8

class QuantizedCNN(nn.Module):
    def __init__(self):
        super(QuantizedCNN, self).__init__()
        
        self.conv1 = qnn.QuantConv2d(
            4, 
            20, 
            kernel_size=3, stride=2, padding=1, 
            bias = False,
            weight_bit_width=bit_quantization, 
            weight_quant=CommonWeightQuant, 
        )
        self.relu1 = qnn.QuantReLU(
            act_quant=CommonUintActQuant,
            bit_width=bit_quantization,
            return_quant_tensor=True
        )
        
        self.conv2 = qnn.QuantConv2d(
            20, 
            8, 
            kernel_size=1, stride=1,
            bias = False,
            weight_bit_width=bit_quantization, 
            weight_quant=CommonWeightQuant, 
        )
        self.relu2 = qnn.QuantReLU(
            act_quant=CommonUintActQuant,
            bit_width=bit_quantization,
            return_quant_tensor=True
        )
        
        self.fc1 = qnn.QuantLinear(
            8*16*16, 
            6, 
            bias = False,
            weight_bit_width=bit_quantization, 
            weight_quant=CommonWeightQuant, 
        )

        # for m in self.modules():
        #     if isinstance(m, qnn.QuantConv2d) or isinstance(m, qnn.QuantLinear):
        #         nn.init.uniform_(m.weight.data, -1, 1)
        #         if m.bias is not None:
        #             nn.init.zeros_(m.bias.data)

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        
        x = self.conv2(x)
        x = self.relu2(x)
        
        x = x.view(x.size(0), -1)
        
        x = self.fc1(x)      
        return x

## 1. Brevitas Export, FINN Import and Tidy-Up

Similar to what we did in the TFC-w1a1 end-to-end notebook, we will start by exporting the [pretrained CNV-w1a1 network](https://github.com/Xilinx/brevitas/tree/master/src/brevitas_examples/bnn_pynq) to ONNX, importing that into FINN and running the "tidy-up" transformations to have a first look at the topology. The network will be exported in QONNX format and then converted into the FINN-ONNX format to prepare it for the FINN compiler.

In [3]:
import torch
import onnx
from finn.util.test import get_test_model_trained
from brevitas.export import export_qonnx
from qonnx.util.cleanup import cleanup as qonnx_cleanup
from qonnx.core.modelwrapper import ModelWrapper
from finn.transformation.qonnx.convert_qonnx_to_finn import ConvertQONNXtoFINN
from qonnx.transformation.infer_shapes import InferShapes
from qonnx.transformation.fold_constants import FoldConstants
from qonnx.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames, RemoveStaticGraphInputs

cnv = QuantizedCNN()
cnv.load_state_dict(torch.load("cnn-sat6-w8.pt"))

print(cnv)

export_onnx_path = build_dir + "/end2end_cnv_w1a1_export.onnx"
export_qonnx(cnv, torch.randn(1, 4, 32, 32), export_onnx_path)
qonnx_cleanup(export_onnx_path, out_file=export_onnx_path)
model = ModelWrapper(export_onnx_path)
model = model.transform(ConvertQONNXtoFINN())
model = model.transform(InferShapes())
model = model.transform(FoldConstants())
model = model.transform(GiveUniqueNodeNames())
model = model.transform(GiveReadableTensorNames())
model = model.transform(RemoveStaticGraphInputs())
model.save(build_dir + "/end2end_cnv_w1a1_tidy.onnx")

QuantizedCNN(
  (conv1): QuantConv2d(
    4, 20, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
    (input_quant): ActQuantProxyFromInjector(
      (_zero_hw_sentinel): StatelessBuffer()
    )
    (output_quant): ActQuantProxyFromInjector(
      (_zero_hw_sentinel): StatelessBuffer()
    )
    (weight_quant): WeightQuantProxyFromInjector(
      (_zero_hw_sentinel): StatelessBuffer()
      (tensor_quant): RescalingIntQuant(
        (int_quant): IntQuant(
          (float_to_int_impl): RoundSte()
          (tensor_clamp_impl): TensorClampSte()
          (delay_wrapper): DelayWrapper(
            (delay_impl): _NoDelay()
          )
        )
        (scaling_impl): ConstScaling(
          (restrict_clamp_scaling): _RestrictClampValue(
            (clamp_min_ste): Identity()
            (restrict_value_impl): FloatRestrictValue()
          )
          (value): StatelessBuffer()
        )
        (int_scaling_impl): IntScaling()
        (zero_point_impl): ZeroZeroPoint(
    

Now that the model is exported, let's have a look at its layer structure with Netron. Remember that the visualization below is interactive, you can click on the individual nodes and view the layer attributes, trained weights and so on.

In [4]:
showInNetron(build_dir+"/end2end_cnv_w1a1_tidy.onnx")

Serving '/tmp/finn_dev_artti/end2end_cnv_w1a1_tidy.onnx' at http://0.0.0.0:8081


You can see that the network is composed of a repeating convolution-convolution-maxpool layer pattern to extract features using 3x3 convolution kernels (with weights binarized), followed by fully connected layers acting as the classifier. Also notice the initial `MultiThreshold` layer at the beginning of the network, which is quantizing float inputs to 8-bit ones.

### Adding Pre- and Postprocessing <a id='prepost'></a>

Preprocessing and postprocessing steps can be added directly in the ONNX graph. In this case, the preprocessing step divides the input `uint8` data by 255 so the inputs to the CNV-w1a1 network are bounded between [0, 1]. The postprocessing step takes the output of the network and returns the index (0-9) of the image category with the highest probability (top-1). 

In [5]:
# from finn.util.pytorch import ToTensor
# from qonnx.transformation.merge_onnx_models import MergeONNXModels
from qonnx.core.datatype import DataType

model = ModelWrapper(build_dir+"/end2end_cnv_w1a1_tidy.onnx")
# global_inp_name = model.graph.input[0].name
# ishape = model.get_tensor_shape(global_inp_name)
# # preprocessing: torchvision's ToTensor divides uint8 inputs by 255
# totensor_pyt = ToTensor()
# chkpt_preproc_name = build_dir+"/end2end_cnv_w1a1_preproc.onnx"
# export_qonnx(totensor_pyt, torch.randn(ishape), chkpt_preproc_name)
# qonnx_cleanup(chkpt_preproc_name, out_file=chkpt_preproc_name)
# pre_model = ModelWrapper(chkpt_preproc_name)
# pre_model = pre_model.transform(ConvertQONNXtoFINN())

# # join preprocessing and core model
# model = model.transform(MergeONNXModels(pre_model))
# # add input quantization annotation: UINT8 for all BNN-PYNQ models
global_inp_name = model.graph.input[0].name
model.set_tensor_datatype(global_inp_name, DataType["UINT8"])

In [6]:
from qonnx.transformation.insert_topk import InsertTopK
from qonnx.transformation.infer_datatypes import InferDataTypes

# postprocessing: insert Top-1 node at the end
model = model.transform(InsertTopK(k=1))
chkpt_name = build_dir+"/end2end_cnv_w1a1_pre_post.onnx"
# tidy-up again
model = model.transform(InferShapes())
model = model.transform(FoldConstants())
model = model.transform(GiveUniqueNodeNames())
model = model.transform(GiveReadableTensorNames())
model = model.transform(InferDataTypes())
model = model.transform(RemoveStaticGraphInputs())
model.save(chkpt_name)

In [7]:
showInNetron(build_dir+"/end2end_cnv_w1a1_pre_post.onnx")

Stopping http://0.0.0.0:8081
Serving '/tmp/finn_dev_artti/end2end_cnv_w1a1_pre_post.onnx' at http://0.0.0.0:8081


## 2. How FINN Implements Convolutions: Lowering and Streamlining

In FINN, we implement convolutions with the *lowering* approach: we convert them to matrix-matrix multiply operations, where one of the matrices is generated by sliding a window over the input image. You can read more about the sliding window operator and how convolution lowering works [in this notebook](https://github.com/maltanar/qnn-inference-examples/blob/master/3-convolutional-binarized-gtsrb.ipynb). The streaming dataflow architecture we will end up with is going to look something like this figure from the [FINN-R paper](https://arxiv.org/abs/1809.04570):

![](cnv-mp-fc.png)

Note how the convolution layer looks very similar to the fully connected one in terms of the matrix-vector-threshold unit (MVTU), but now the MVTU is preceded by a sliding window unit that produces the matrix from the input image. All of these building blocks, including the `MaxPool` layer you see in this figure, exist as templated Vitis HLS C++ functions in [finn-hlslib](https://github.com/Xilinx/finn-hlslib).


To target this kind of hardware architecture with our network we'll apply a convolution lowering transformation, in addition to streamlining. You may recall the *streamlining transformation* that we applied to the TFC-w1a1 network, which is a series of mathematical simplifications that allow us to get rid of floating point scaling operations by implementing few-bit activations as thresholding operations. 

**The current implementation of streamlining is highly network-specific and may not work for your network if its topology is very different than the example network here. We hope to rectify this in future releases.**

In [8]:
from finn.transformation.streamline import Streamline
from qonnx.transformation.lower_convs_to_matmul import LowerConvsToMatMul
from qonnx.transformation.bipolar_to_xnor import ConvertBipolarMatMulToXnorPopcount
import finn.transformation.streamline.absorb as absorb
from finn.transformation.streamline.reorder import MakeMaxPoolNHWC, MoveScalarLinearPastInvariants
from qonnx.transformation.infer_data_layouts import InferDataLayouts
from qonnx.transformation.general import RemoveUnusedTensors

model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_pre_post.onnx")
model = model.transform(MoveScalarLinearPastInvariants())
model = model.transform(Streamline())
model = model.transform(LowerConvsToMatMul())
model = model.transform(MakeMaxPoolNHWC())
model = model.transform(absorb.AbsorbTransposeIntoMultiThreshold())
model = model.transform(ConvertBipolarMatMulToXnorPopcount())
model = model.transform(Streamline())
# absorb final add-mul nodes into TopK
model = model.transform(absorb.AbsorbScalarMulAddIntoTopK())
model = model.transform(InferDataLayouts())
model = model.transform(RemoveUnusedTensors())
model.save(build_dir + "/end2end_cnv_w1a1_streamlined.onnx")



We won't go into too much detail about what happens in each transformation and why they are called in the particular order they are (feel free to visualize the intermediate steps using Netron yourself if you are curious) but here is a brief summary:

* `Streamline` moves floating point scaling and addition operations closer to the input of the nearest thresholding activation and absorbs them into thresholds
* `LowerConvsToMatMul` converts ONNX `Conv` nodes into sequences of `Im2Col, MatMul` nodes as discussed above. `Im2Col` is a custom FINN ONNX high-level node type that implements the sliding window operator.
* `MakeMaxPoolNHWC` and `AbsorbTransposeIntoMultiThreshold` convert the *data layout* of the network into the NHWC data layout that finn-hlslib primitives use. NCHW means the tensor dimensions are ordered as `(N : batch, H : height, W : width, C : channels)` (assuming 2D images). The ONNX standard ops normally use the NCHW layout, but the ONNX intermediate representation itself does not dictate any data layout.
* You may recall `ConvertBipolarMatMulToXnorPopcount` from the TFC-w1a1 example, which is needed to implement bipolar-by-bipolar (w1a1) networks correctly using finn-hlslib.

Let's visualize the streamlined and lowered network with Netron. Observe how all the `Conv` nodes have turned into pairs of `Im2Col, MatMul` nodes, and many nodes including `BatchNorm, Mul, Add` nodes have disappeared and replaced with `MultiThreshold` nodes.

In [9]:
showInNetron(build_dir+"/end2end_cnv_w1a1_streamlined.onnx")

Stopping http://0.0.0.0:8081
Serving '/tmp/finn_dev_artti/end2end_cnv_w1a1_streamlined.onnx' at http://0.0.0.0:8081


## 3. Partitioning, Conversion to HLS Layers and Folding

The next steps will be (again) very similar to what we did for the TFC-w1a1 network. We'll first convert the layers that we can put into the FPGA into their HLS equivalents and separate them out into a *dataflow partition*:


In [10]:
import finn.transformation.fpgadataflow.convert_to_hls_layers as to_hls
from finn.transformation.fpgadataflow.create_dataflow_partition import (
    CreateDataflowPartition,
)
from finn.transformation.move_reshape import RemoveCNVtoFCFlatten
from qonnx.custom_op.registry import getCustomOp
from qonnx.transformation.infer_data_layouts import InferDataLayouts

# choose the memory mode for the MVTU units, decoupled or const
mem_mode = "const"

model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_streamlined.onnx")
model = model.transform(to_hls.InferBinaryMatrixVectorActivation(mem_mode))
model = model.transform(to_hls.InferQuantizedMatrixVectorActivation(mem_mode))
# TopK to LabelSelect
model = model.transform(to_hls.InferLabelSelectLayer())
# input quantization (if any) to standalone thresholding
model = model.transform(to_hls.InferThresholdingLayer())
model = model.transform(to_hls.InferConvInpGen())
model = model.transform(to_hls.InferStreamingMaxPool())
# get rid of Reshape(-1, 1) operation between hlslib nodes
model = model.transform(RemoveCNVtoFCFlatten())
# get rid of Tranpose -> Tranpose identity seq
model = model.transform(absorb.AbsorbConsecutiveTransposes())
# infer tensor data layouts
model = model.transform(InferDataLayouts())
parent_model = model.transform(CreateDataflowPartition())
parent_model.save(build_dir + "/end2end_cnv_w1a1_dataflow_parent.onnx")
sdp_node = parent_model.get_nodes_by_op_type("StreamingDataflowPartition")[0]
sdp_node = getCustomOp(sdp_node)
dataflow_model_filename = sdp_node.get_nodeattr("model")
# save the dataflow partition with a different name for easier access
dataflow_model = ModelWrapper(dataflow_model_filename)
dataflow_model.save("sat6-cnn-t1w8.onnx")

Notice the additional `RemoveCNVtoFCFlatten` transformation that was not used for TFC-w1a1. In the last Netron visualization you may have noticed a `Reshape` operation towards the end of the network where the convolutional part of the network ends and the fully-connected layers started. That `Reshape` is essentialy a tensor flattening operation, which we can remove for the purposes of hardware implementation. We can examine the contents of the dataflow partition with Netron, and observe the `ConvolutionInputGenerator`, `MatrixVectorActivation` and `StreamingMaxPool_Batch` nodes that implement the sliding window, matrix multiply and maxpool operations in hlslib. *Note that the MatrixVectorActivation instances following the ConvolutionInputGenerator nodes are really implementing the convolutions, despite the name. The final three MatrixVectorActivation instances implement actual FC layers.*

In [11]:
showInNetron("sat6-cnn-t1w8.onnx")

Stopping http://0.0.0.0:8081
Serving 'sat6-cnn-t1w8.onnx' at http://0.0.0.0:8081


In [12]:
#https://www.kaggle.com/code/vmarkin/advatt
# carregar as bibliotecas básicas necessárias
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import torch.nn.functional as F
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms as T
from tqdm import tqdm
from sklearn.metrics import accuracy_score

torch.manual_seed(4)
# gerar os gráficos logo abaixo dos comandos de plot
%matplotlib inline

test_data_path = '../dataset/X_test_sat6.csv'
test_label_path = '../dataset/y_test_sat6.csv'

test_qtdy = 100  

def data_read(data_path, nrows):
    data=pd.read_csv(data_path, header=None, nrows=nrows, dtype=np.uint8)
    data=data.values ## converting the data into numpy array
    return data

##Read test data
test_data=data_read(test_data_path, nrows=test_qtdy)
print("Test data shape:" + str(test_data.shape))

##Read test data labels
test_data_label=data_read(test_label_path,nrows=test_qtdy)
print("Test data label shape:" + str(test_data_label.shape))

test_data_reshaped = test_data.reshape(test_qtdy,28,28,4) 

final_test_data = np.zeros((test_qtdy, 32, 32, 4),dtype=np.float32)
final_test_data[:, :28, :28, :] = test_data_reshaped;

output_tensor = []

for label in test_data_label:
    output_tensor.append(label.argmax())



input_tensor = torch.from_numpy(final_test_data)
output_tensor = torch.Tensor(output_tensor)

print(input_tensor[0].int())
print(output_tensor)
print(output_tensor.shape)

np.save("input.npy", input_tensor)
np.save("expected_output.npy", output_tensor)

class SatImgDataset(Dataset):
    def __init__(self, X, y):
        self.X = X
        self.y = y
        self.transform = T.ToTensor()
    def __len__(self):
        return len(self.y)
    
    def __getitem__(self, index):
        x = self.transform(self.X[index])
        y = torch.FloatTensor(self.y[index])
        return {'x':x, 'y':y}

dataset_test = SatImgDataset(final_test_data, test_data_label)

loader_test = DataLoader(dataset_test, test_qtdy, shuffle=False)

Test data shape:(100, 3136)
Test data label shape:(100, 6)
tensor([[[ 95,  91,  61, 157],
         [105, 113, 101, 179],
         [ 50,  35,  24, 124],
         ...,
         [  0,   0,   0,   0],
         [  0,   0,   0,   0],
         [  0,   0,   0,   0]],

        [[ 86,  82,  58, 146],
         [ 90,  96,  77, 161],
         [ 85,  93,  62, 159],
         ...,
         [  0,   0,   0,   0],
         [  0,   0,   0,   0],
         [  0,   0,   0,   0]],

        [[122, 139, 137, 199],
         [ 90,  95,  74, 162],
         [113, 126, 116, 191],
         ...,
         [  0,   0,   0,   0],
         [  0,   0,   0,   0],
         [  0,   0,   0,   0]],

        ...,

        [[  0,   0,   0,   0],
         [  0,   0,   0,   0],
         [  0,   0,   0,   0],
         ...,
         [  0,   0,   0,   0],
         [  0,   0,   0,   0],
         [  0,   0,   0,   0]],

        [[  0,   0,   0,   0],
         [  0,   0,   0,   0],
         [  0,   0,   0,   0],
         ...,
         [  

Build do hardware

In [13]:
import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg
import os
import shutil

model_file = "sat6-cnn-t1w8.onnx"

estimates_output_dir = "build_t1w8_v2"

#Delete previous run results if exist
if os.path.exists(estimates_output_dir):
    shutil.rmtree(estimates_output_dir)
    print("Previous run results deleted!")


cfg_estimates = build.DataflowBuildConfig(
    output_dir          = estimates_output_dir,
    mvau_wwidth_max     = 1000000, #tinha usado 80
    target_fps          = 10000, #tinha usado 100
    synth_clk_period_ns = 10.0,
    fpga_part           = "xc7z020clg484-1", 
    rtlsim_batch_size   = 100,
    folding_config_file = "folding.json",
    verify_input_npy    = "input.npy",
    verify_expected_output_npy = "expected_output.npy",
    verify_save_rtlsim_waveforms = True,
    # board = "Pynq-Z1",
    # shell_flow_type = build_cfg.ShellFlowType.VIVADO_ZYNQ,
    default_mem_mode = build_cfg.ComputeEngineMemMode.CONST,
    generate_outputs=[
        build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
        build_cfg.DataflowOutputType.STITCHED_IP,
        build_cfg.DataflowOutputType.RTLSIM_PERFORMANCE,
        # build_cfg.DataflowOutputType.PYNQ_DRIVER,
        # build_cfg.DataflowOutputType.BITFILE
    ],
    verify_steps=[
        build_cfg.VerificationStepType.STITCHED_IP_RTLSIM,
    ]
)

In [14]:
%%time
build.build_dataflow_cfg(model_file, cfg_estimates)

Building dataflow accelerator from sat6-cnn-t1w8.onnx
Intermediate outputs will be generated in /tmp/finn_dev_artti
Final outputs will be generated in build_t1w8_v2
Build log is at build_t1w8_v2/build_dataflow.log
Running step: step_qonnx_to_finn [1/18]
Running step: step_tidy_up [2/18]
Running step: step_streamline [3/18]
Running step: step_convert_to_hls [4/18]
Running step: step_create_dataflow_partition [5/18]
Running step: step_target_fps_parallelization [6/18]
Running step: step_apply_folding_config [7/18]
Running step: step_minimize_bit_width [8/18]
Running step: step_generate_estimate_reports [9/18]
Running step: step_hls_codegen [10/18]
Running step: step_hls_ipgen [11/18]
Running step: step_set_fifo_depths [12/18]
Running step: step_create_stitched_ip [13/18]
[0] -Info: xpm_memory.sv:506: Assertion failed in TOP.finn_design_wrapper.finn_design_i.StreamingFIFO_1.fifo.inst.gen_fifo.xpm_fifo_axis_inst.xpm_fifo_base_inst.gen_sdpram.xpm_memory_base_inst.config_drc: [XPM_MEMORY 20-

0

In [15]:
device = torch.device('cpu')
array_outputs_base = []
with torch.no_grad():
    correct = 0
    i=0
    for batch in loader_test:
        pred = cnv(batch['x'].to(device))
        predicted = torch.max(pred, 1)[1]
        real_class = torch.max(batch['y'].to(device), 1)[1]
        correct += (predicted == real_class).sum()
        array_outputs_base = predicted

print(array_outputs_base.shape)
print(array_outputs_base)

torch.Size([100])
tensor([3, 2, 5, 3, 5, 3, 5, 1, 5, 5, 5, 2, 3, 5, 3, 3, 2, 5, 0, 0, 2, 3, 1, 5,
        1, 1, 5, 1, 2, 1, 5, 5, 5, 5, 0, 1, 5, 3, 3, 2, 2, 5, 2, 3, 3, 5, 0, 3,
        5, 1, 3, 1, 3, 3, 3, 2, 2, 3, 5, 5, 1, 5, 2, 5, 0, 2, 1, 1, 5, 5, 3, 3,
        5, 5, 3, 3, 5, 5, 2, 5, 5, 2, 5, 3, 3, 1, 2, 5, 2, 3, 2, 3, 3, 3, 3, 2,
        1, 1, 0, 3])


In [16]:
import numpy
import os
import re

caminho_da_pasta = f"/home/artti/Desktop/finn/notebooks/sat6_cnn/T1-8bit/{estimates_output_dir}/verification_output"

arquivos_na_pasta = os.listdir(caminho_da_pasta)

array_outputs = numpy.zeros(len(arquivos_na_pasta))

# Itera sobre cada arquivo
for nome_arquivo in arquivos_na_pasta:
    caminho_arquivo = os.path.join(caminho_da_pasta, nome_arquivo)
    output = numpy.load(caminho_arquivo)
    padrao = re.compile(r"_([0-9]+)_")
    correspondencia = padrao.search(nome_arquivo)    
    numero_do_teste = int(correspondencia.group(1))
    array_outputs[numero_do_teste] = output[0]

if(len(array_outputs) == len(array_outputs_base)):
    sucess = 0
    for i in range(len(array_outputs)):
        print(f"saida hardware: {int(array_outputs[i])} x {array_outputs_base[i]} :saida software")
        if (int(array_outputs[i]) == array_outputs_base[i]):
            sucess = sucess + 1            

    precision = sucess/len(array_outputs)
    print(precision)

#baixa correspondência -> decoupled para const lá em cima e aumentar a utilização de área NÃO FUNCIONOU
#baixa correspondência -> retirar o div por 255 do começo FUNCIONOU :D

saida hardware: 3 x 3 :saida software
saida hardware: 2 x 2 :saida software
saida hardware: 5 x 5 :saida software
saida hardware: 3 x 3 :saida software
saida hardware: 5 x 5 :saida software
saida hardware: 3 x 3 :saida software
saida hardware: 5 x 5 :saida software
saida hardware: 1 x 1 :saida software
saida hardware: 5 x 5 :saida software
saida hardware: 5 x 5 :saida software
saida hardware: 5 x 5 :saida software
saida hardware: 2 x 2 :saida software
saida hardware: 3 x 3 :saida software
saida hardware: 5 x 5 :saida software
saida hardware: 3 x 3 :saida software
saida hardware: 3 x 3 :saida software
saida hardware: 2 x 2 :saida software
saida hardware: 5 x 5 :saida software
saida hardware: 0 x 0 :saida software
saida hardware: 0 x 0 :saida software
saida hardware: 2 x 2 :saida software
saida hardware: 3 x 3 :saida software
saida hardware: 1 x 1 :saida software
saida hardware: 5 x 5 :saida software
saida hardware: 1 x 1 :saida software
saida hardware: 1 x 1 :saida software
saida hardwa