# FINN - End-to-End Flow
-----------------------------------------------------------------
This notebook gives an overview about the end to end flow of FINN. From loading an ONNX model from Brevitas, followed by the numerous transformations in FINN and up to the generation of a bitstream that can be used to load an FPGA. 

We'll use the following showSrc function to print the source code for function calls in the Jupyter notebook.

In [8]:
import inspect

def showSrc(what):
    print("".join(inspect.getsourcelines(what)[0]))

## Outline
-------------
1. Preparation of model to pass to FINN
2. FINN transformations
    * first transformations
    * Streamline
    * last transformations
3. Verification
4. Bitstream generation

### 1. Preparation of model to pass to FINN
FINN expects an ONNX model as input. This can be a model trained with [Brevitas](https://github.com/Xilinx/brevitas). Brevitas is a Pytorch library for quantization-aware training and the FINN Docker image comes with several [example Brevitas networks](https://github.com/maltanar/brevitas_cnv_lfc). To show the FINN end-to-end flow, we'll use the LFC-w1a1 model as example network. The Brevitas export is only briefly described here, for details see Jupyter notebook [3-FINN-Brevitas-network-import](3-FINN-Brevitas-network-import.ipynb).

First a few things have to be imported. Then the model can be loaded with the pretrained weights.

In [2]:
import torch
import brevitas.onnx as bo
from models.LFC import LFC

lfc = LFC(weight_bit_width=1, act_bit_width=1, in_bit_width=1)
trained_lfc_checkpoint = ("/workspace/brevitas_cnv_lfc/pretrained_models/LFC_1W1A/checkpoints/best.tar")
checkpoint = torch.load(trained_lfc_checkpoint, map_location="cpu")
lfc.load_state_dict(checkpoint["state_dict"])
bo.export_finn_onnx(lfc, (1, 1, 28, 28), "lfc_w1_a1.onnx")

  x = 2.0 * x - torch.tensor([1.0])


The model was now exported, loaded with the pretrained weights and saved under the name "lfc_w1_a1.onnx".
To visualize the exported model, Netron can be used. Netron is a visualizer for neural networks and allows interactive investigation of network properties. For example, you can click on the individual nodes and view the properties.

In [4]:
import netron
netron.start("lfc_w1_a1.onnx", port=8081, host="0.0.0.0")

Serving 'lfc_w1_a1.onnx' at http://0.0.0.0:8081


In [4]:
%%html
<iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>

Now that we have the model in .onnx format, we can work with it using FINN. For that FINN `ModelWrapper` is used. It is a wrapper around the ONNX model which provides several helper functions to make it easier to work with the model. For details see Jupyter notebook [2-FINN-ModelWrapper](2-FINN-ModelWrapper.ipynb).

In [1]:
from finn.core.modelwrapper import ModelWrapper
model = ModelWrapper("lfc_w1_a1.onnx")

Now the model is prepared and it can be processed in different ways. The principle of FINN are analysis and transformation passes, which can be applied to the model. An analysis pass extracts specific information about the model and returns it to the user in the form of a dictionary. For more details see [4-FINN-HowToAnalysisPass](4-FINN-HowToAnalysisPass.ipynb). A transformation pass changes the model and returns the changed model back to the FINN flow, for more information about transformation passes see notebook [5-FINN-HowToTransformationPass](5-FINN-HowToTransformationPass.ipynb).

Since the goal in this notebook is to process the model to such an extent that a bitstream can be generated from it, the focus is on the transformations that are necessary for this. In the next section these are discussed in more detail.

### 2. FINN transformations
#### First transformations
* InferShapes
* FoldConstants
* GiveUniqueNodeNames
* GiveReadableTensorNames

The first transformations that are taking place are listed above. First, `InferShapes` is applied to the model, in this transformation all tensor shapes are derived and set in ValueInfo of the model. In the next step constants are folded. This means that a node with constant output is replaced by const-only inputs for the next node. Afterwards the nodes and tensors are given unique and readable names. This sequence can be seen below and then the model is saved and visualized with netron. 

In [2]:
from finn.transformation.infer_shapes import InferShapes
from finn.transformation.fold_constants import FoldConstants
from finn.transformation.general import GiveReadableTensorNames, GiveUniqueNodeNames

model = model.transform(InferShapes())
model = model.transform(FoldConstants())
model = model.transform(GiveUniqueNodeNames())
model = model.transform(GiveReadableTensorNames())

In [5]:
model.save("lfc_w1_a1.onnx")
netron.start("lfc_w1_a1.onnx", port=8081, host="0.0.0.0")


Stopping http://0.0.0.0:8081
Serving 'lfc_w1_a1.onnx' at http://0.0.0.0:8081


In [6]:
%%html
<iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>

#### Streamline
* ConvertSubToAdd
* BatchNormToAffine
* ConvertSignToThres
* MoveScalarAddPastMatMul
* MoveScalarMulPastMatMul
* MoveAddPastMatMul
* CollapseRepeatedAdd
* CollapseRepeatedMul
* AbsorbAddIntoMultiThreshold
* FactorOutMulSignMagnitude
* AbsorbMulIntoMultiThreshold
* Absorb1BitMulIntoMatMul
* RoundAndClipThresholds

After these transformations, the most complex transformation step occurs. The streamlining transformation is applied to the model. For details see arXiv:1709.04060. The sequence of transformations it contains can be seen above and using the `showSrc()` below.

In [9]:
from finn.transformation.streamline import Streamline
showSrc(Streamline)

class Streamline(Transformation):
    """Apply the streamlining transform, see arXiv:1709.04060."""

    def apply(self, model):
        streamline_transformations = [
            ConvertSubToAdd(),
            BatchNormToAffine(),
            ConvertSignToThres(),
            MoveScalarAddPastMatMul(),
            MoveScalarMulPastMatMul(),
            MoveAddPastMul(),
            CollapseRepeatedAdd(),
            CollapseRepeatedMul(),
            AbsorbAddIntoMultiThreshold(),
            FactorOutMulSignMagnitude(),
            AbsorbMulIntoMultiThreshold(),
            Absorb1BitMulIntoMatMul(),
            RoundAndClipThresholds(),
        ]
        for trn in streamline_transformations:
            model = model.transform(trn)
            model = model.transform(GiveUniqueNodeNames())
            model = model.transform(GiveReadableTensorNames())
            model = model.transform(InferDataTypes())
        return (model, False)



You can see in the code that different transformations are involved in the streamlining transformation. The individual transformations are described in more detail below. After each transformation step, three further transformations are applied to the model. On the one hand, the node and tensor names are made unique and readable again, on the other hand the transformation `InferDataTypes()` is executed. In this transformation the data types for each tensor are derived and set.

#### Last transformations
* ConvertBipolarMatMulToXnorPopcount
* AbsorbAddIntoMultiThreshold
* AbsorbMulIntoMultiThreshold
* RoundAndClipThresholds

### 3.  Verification

### 4. Bitstream generation