# Radio Modulation with FINN - Notebook #3 of 5

***Input:*** Brevitas QONNX model 

***Output:*** FINN compatible "FINN-ONNX" model

### Overview 
This notebooks walks you through performing **transformations** to conver the ONNX model into a FINN-ONNX model where **all the layers are compatible with FINN** and can be **converted to HLS/TRL in the next notebook**.
1. Visualization of the model with Netron!
2. The first **transformation:** **ConvertQONNXtoFINN**
3. The second set of transformations: **Tidy transformations**
4. The **removal of an unsupported node**, and corresponding adjustments 
5. Add a top-K node to the model (essentially telling the model to output the class with the highest probability)
6. Setting of **datatypes** and saving the finn-onnx model
   

### FINN Pipeline Map
Throughout these notebooks, you will begin to understand the FINN pipeline! In order the pipeline is:
1. Dataset and Vanilla model
2. Brevitas Model
3. **Transforming the Brevitas Model to finn.onnx** (you are here)
4. Transforming tidy.onnx to bitstream
5. Loading the bitstream on the FPGA!

### Step 0: Loading the Brevtias QONNX model from last notebook

Reminder: 
1. **O**NNX is a general machine learning library with lots of tools, including an Intermediate Representation (IR) toolkit for defining and interacting with Neural Networks. ONNX also provides a standardize saving format, so we can save and load ".onnx" files.
2. **QO**NNX on the other hand uses ONNX's IR for neural networks and _extends_ some capabilites of ONNX. You have seen that Brevitas can be used to train a neural network... well alot of the magic behind Brevitas requires the use of *QO*NNX and it's ability to use datatypes that are _less than 8 bits_.
3. **FINN** then takes models that are represented with *QO*NNX datastructures as inputs and _compiles_ then. (More on this in future notebooks)

That is a high level overview of where QONNX sits in the pipeline. Using there wrapper around ONNX, we can now prepare the model for the FINN compiler. The first step is to load our Brevitas Model file, and wrap it with a QONNX "ModelWrapper", which essentially exposes QONNX capabilites on a loaded *.onnx file! 


In [8]:
from qonnx.core.modelwrapper import ModelWrapper
from finn.util.visualization import showInNetron
from finn.transformation.qonnx.convert_qonnx_to_finn import ConvertQONNXtoFINN
from qonnx.transformation.infer_shapes import InferShapes
from qonnx.transformation.infer_datatypes import InferDataTypes
from qonnx.transformation.insert_topk import InsertTopK
import onnx
from qonnx.core.datatype import DataType
from qonnx.transformation.general import (
    GiveReadableTensorNames,
    GiveUniqueNodeNames,
)

#Path to the qonnx model exported by brevitas
brevitas_model_pth='27ml_rf/models/radio_27ml_brevitas.onnx'
model=ModelWrapper(brevitas_model_pth)

# Here we print 2 required environment variables. These should point to your /../Vivado/ path and your /../Vivado/Vitis_HLS path 
# which provide required libraries for FINN.
!echo $VIVADO_PATH
!echo $HLS_PATH

/tools/Xilinx/Vivado/2024.1
/tools/Xilinx/Vitis_HLS/2024.1


### Step 1: Visualizing the Brevitas ONNX model using `showInNetron`

***Netron*** is a webapp that displays *a graph of a Nueral Network*! 

Using this we can obseve the nodes, their properties, and the connections in the Neural Network. This is particularly helpful in this pipeline because some nodes are _not supported by finn_. To fix this, we observe the nodes in the Netron Output and conduct a series of transformation to preserve the model's overall architecure, while making it compatible with *finn*!

In [2]:
host_machine_ip='localhost'
assert host_machine_ip!='your host machine IP', print('host_machine_ip not set')
showInNetron(brevitas_model_pth,localhost_url=host_machine_ip, port=8081)

Serving '27ml_rf/models/radio_27ml_brevitas.onnx' at http://0.0.0.0:8081


### Step 2: Converting the model to FINN-ONNX model

We call a **ONNX** model **FINN-ONNX** when it _only has layers and an architecture supported by finn_.

In the below cell we start the process of this conversion. Specifically, the first step is to **initialize** the model with it's first transform "ConvertQONNXtoFINN". This takes a model that is currently wrapped as a QONNX model, and creates an object that we can now progressively transform until it is ready for FINN (as well provides some automatic transformations under the hood!)

For example, 
Onnx has something called a Gemm transformation. It stands for generanal matrix multiplication. The Doc is here: https://github.com/onnx/onnx/blob/main/docs/Changelog.md#Gemm-9

FINN does not support this. And because the QONNX network still has them, we must convert it to a standard MatMul. 

In [3]:
model = model.transform(ConvertQONNXtoFINN())
finn_model_pth ='27ml_rf/models/radio_27ml_inital_finn.onnx'
model.save(finn_model_pth)



### Visualizing FINN-QONNX model. 
Feel free to notice the changes below, **the graph looks very different**, and this was all handled by the first transformation.

FINN handles alot of the more backend of the transformation. Therefore all that is generally required by us it was manual transformations we must make and why. In the next cell we take a look out the first manual transformation. 

***Notice*** the "MultiThreshold" node added by the FINN backend. This node uses threshold determined by the backend of FINN to "bin" input values to values within range of the quantization defined by our quant nodes!

as ADC - analog to digital converter, as it takes the input and uses different thresholds to quantize our. Specifically, the input is "binned" to a "quantized value" based thresholds determined by the backend of FINN using our Quantizer Layers! 

The connection between ADC and quantization is interesting! Both are "stepping down" a signal of more samples (in the adc's case continuous) to a discrete signal in a smaller range. 

In [5]:
showInNetron(finn_model_pth,localhost_url=host_machine_ip, port=8081)

Stopping http://0.0.0.0:8081
Serving '27ml_rf/models/radio_27ml_inital_finn.onnx' at http://0.0.0.0:8081


### Step 3: Tidy Transformations
This transform is named "Tidy". Specifically, we provide a list of 4 transformations that "tidy" the model graph up. This is _required_ for the pipeline (and I wish FINN automatically did it... maybe we can open a PR for this?). 

In the above netron cell, click on the first "Add" node you see and observe the bold header "INPUT". Here you'll see the names of the inputs to the "Add" node are something similar to pTtSzD. In otherwords, un-descriptive names. 

Well in the Tidy transformation we fix this ! Each node, and it's inputs recieves a more readable name. We also:
1. Infer the shapes of each node
2. Infer the datatype of each node and attach this
   
All of these are attached as _attributes_ to the node. These properties can be viewed by observing the model in netron below

***Do note*** we do not choose to do this; it is _required_ for the FINN pipeline as _something_ in the backend of the FINN library requires these. 

***TODO:*** FINN in its most recent version _already_ has the node and inputs and outputs shapes, having to infer them in a tidy transformation may be depreciated. For now, this should be kept just in case. 

In [6]:
# Reload our FINN model. 
finn_model = ModelWrapper(finn_model_pth)

# Define our transforms
transforms = [
    InferShapes(),
    InferDataTypes(),
    GiveUniqueNodeNames(),
    GiveReadableTensorNames(),
]

for transform in transforms:
    finn_model = finn_model.transform(transform)

finn_model.cleanup()

pre_net_surgery_pth='27ml_rf/models/radio_27ml_pre_nw_surgery.onnx'
finn_model.save(pre_net_surgery_pth)

In [7]:
showInNetron(pre_net_surgery_pth,localhost_url=host_machine_ip, port=8081)

Stopping http://0.0.0.0:8081
Serving '27ml_rf/models/radio_27ml_pre_nw_surgery.onnx' at http://0.0.0.0:8081


## Step 4: Network Surgery
In the graph above, notice the first "MultiThreshold". It was actually the backend of FINN that created this node, but we actually must remove it! 

Specifically, FINN does not support a model where the first node it "input" -> "MultiThreshold". To fix this we manually remove both the "input" and the "MultiThreshold".

The change in the network, reflected by the diagram in Netron will look like this:

**Original:** `input`--> `MultiThreshold`--> `Add`--> `Conv` -->...

**Post-Surgery:** `Add`--> `Conv` --> ...

However, during training, the input of the `Add` node is the output of `MultiThreshold`. Therefore, instead of passing through the raw data to our new model, we will manually perform what `MultiThreshold` does to our raw data (quantizing data) outside the model, then pass it through the new model as input.

Further information can be found in this finn's discussion: https://github.com/Xilinx/finn/discussions/420

**Notice**: This network surgery only works for this specific model architecture (VGG-10). If a different architecture is used, the network surgery might be different or not required. Ultimately, the goal is to get a streamlined model before generating hardware layers and this provides a good example.

#### Preform the surgery. 
In the below cell we conduct the network surgery by first finding the `Conv`, `Add`, and the `original input` nodes. Then second we remove (1) the original input node and (2) assign the `Add` node to be the new input


In [7]:
#Find the first 'Conv' node and store it in 'new_input_node'
first_conv_node = finn_model.get_nodes_by_op_type("Conv")[0]   
#Find the input of that 'Conv' node (in this case it is the 'Add' node)
new_input_tensor = finn_model.get_tensor_valueinfo(first_conv_node.input[0]) 

#Find the original input node of the model.
old_input_tensor = finn_model.graph.input[0] 

#Remove the old input node, and replace it with the new input tensor ('Add' node)
finn_model.graph.input.remove(old_input_tensor) 
finn_model.graph.input.append(new_input_tensor)

#Find the index of the new input node, and remove everything from index 0 to that index
#In this case, we will be removing index 0 and index 1, which are the 'inp' and 'MultiThreshold' nodes
#So now, the 'Add' node become the model input with index 0, and the 'Conv' node has index 1, and so on...
new_input_index = finn_model.get_node_index(first_conv_node)
del finn_model.graph.node[0:new_input_index]

pre_net_surgery_pth='27ml_rf/models/radio_27ml_pre_nw_surgery.onnx'
finn_model.save(pre_net_surgery_pth)

In [8]:
showInNetron(pre_net_surgery_pth,localhost_url=host_machine_ip, port=8081)

Stopping http://0.0.0.0:8081
Serving '27ml_rf/models/radio_27ml_pre_nw_surgery.onnx' at http://0.0.0.0:8081


## Step 5: Insert TopK node.

The output of our model is 27 values, each corresponding to the 27 classes, where the greatest value is used as the predicted class.

We must explicitly "tell" the model to return just a single value: The predicted class. To do this we must add a TopK node, where the TopK values out of the 27 are returned as the predictions. Because we only require the predicted class, we set k=1, and return just the most probable class. 

**Also note** There was an issue with FINN rejecting models due to redunant information in the "finn model graph". Specifically, the input and output nodes were causing an issue, and thus in the below cell we show how that information can be removed. 

In [9]:
# remove redundant value_info for primary input/output
# othwerwise, newer FINN versions will not accept the model
if finn_model.graph.input[0] in finn_model.graph.value_info:
    finn_model.graph.value_info.remove(finn_model.graph.input[0])
if finn_model.graph.output[0] in finn_model.graph.value_info:
    finn_model.graph.value_info.remove(finn_model.graph.output[0])

# insert topK node, with k=1, meaning we pick the 1 classification with highest prediction value
finn_model = finn_model.transform(InsertTopK(k=1))

### Handling compatibility and set the input datatype

In [10]:
# remove redundant value_info for primary input/output
# othwerwise, newer FINN versions will not accept the model
if finn_model.graph.input[0] in finn_model.graph.value_info:
    finn_model.graph.value_info.remove(finn_model.graph.input[0])
if finn_model.graph.output[0] in finn_model.graph.value_info:
    finn_model.graph.value_info.remove(finn_model.graph.output[0])

# manually set input datatype (not done by brevitas yet)
finnonnx_in_tensor_name = finn_model.graph.input[0].name
finnonnx_model_in_shape = finn_model.get_tensor_shape(finnonnx_in_tensor_name)
finn_model.set_tensor_datatype(finnonnx_in_tensor_name, DataType["INT8"])

### Export the model

In [11]:
print("Input tensor name: %s" % finnonnx_in_tensor_name)
print("Input tensor shape: %s" % str(finnonnx_model_in_shape))
print("Input tensor datatype: %s" % str(finn_model.get_tensor_datatype(finnonnx_in_tensor_name)))

# save modified model that is now ready for the FINN compiler
tidy_model_pth='27ml_rf/models/radio_27ml_finn.onnx'
finn_model.save(tidy_model_pth)
print("Modified FINN-ready model saved to %s" % tidy_model_pth)

Input tensor name: Add_0_out0
Input tensor shape: [1, 2, 1024]
Input tensor datatype: INT8
Modified FINN-ready model saved to 27ml_rf/models/radio_27ml_finn.onnx


## Visualise the new model
Notice how the new model now has the `Add` node is the input. Everything from the old input node to the `Add` node is now removed

In [12]:
#Visualise the new model
showInNetron(tidy_model_pth,localhost_url=host_machine_ip, port=8081)

Stopping http://0.0.0.0:8081
Serving '27ml_rf/models/radio_27ml_finn.onnx' at http://0.0.0.0:8081
