# FINN - Code Generation and Compilation
-----------------------------------------------------------------
<font size="3">This notebook is about code generation and compilation to enable execution of FINN custom operation nodes. 

Following showSrc function is used to print the source code of function calls in the Jupyter notebook:</font>

In [1]:
import inspect

def showSrc(what):
    print("".join(inspect.getsourcelines(what)[0]))

## Outline
-------------
* <font size="3">Example model</font>
* <font size="3">Code generation</font>
* <font size="3">Compilation</font>
* <font size="3">CustomOp node execution</font>
* <font size="3">Conclusion</font>

### Example model
<font size="3">To show the code generation and compilation of a node, an example model with a StreamingFCLayer_Batch node is first created. To learn more about FINN custom operation nodes, please take a look at notebook [FINN-CustomOps](FINN-CustomOps.ipynb).

First TensorProto and helper are imported from ONNX. These can be used to create tensors, nodes, graphs and models in ONNX. Additional functions from `util` and the classes `DataType` and `ModelWrapper` are needed. More information about `DataType` and `ModelWrapper` can be found in Jupyter notebook [FINN-ModelWrapper](FINN-ModelWrapper.ipynb).</font>

In [2]:
from onnx import TensorProto, helper
import finn.core.utils as util
from finn.core.datatype import DataType
from finn.core.modelwrapper import ModelWrapper

<font size="3">Then all parameters, that are needed to create a StreamingFCLayer_Batch node, are set. To keep the example clear small values are chosen. For more information about the parameters please take a look at the documentation of the [finn-hls library](https://finn-hlslib.readthedocs.io/en/latest/).</font>

In [3]:
idt = wdt = odt = DataType.BIPOLAR
mw = 8
mh = 8
pe = 4
simd = 4
nf = mh // pe
sf = mw // simd


<font size="3">A `tensor_value_info` is created for all tensors involved. In this case there is one tensor for the weights besides the input and output tensors. Then an input list is created containing the two inputs (`"inp"`and `"weights"`).

**Note**: A StreamingFCLayer_Batch node can also have an output activation which is passed in the form of thresholds as input tensor</font>

In [4]:
inp = helper.make_tensor_value_info("inp", TensorProto.FLOAT, [1, sf, simd])
weights = helper.make_tensor_value_info("weights", TensorProto.FLOAT, [mw, mh])
outp = helper.make_tensor_value_info("outp", TensorProto.FLOAT, [1, nf, pe])
node_inp_list = ["inp", "weights"]

<font size="3">Now the node can be created. The operation type is set to `"StreamingFCLayer_Batch"` and the rest of the attributes are set appropriately. The relevant attributes for the activation of the code generation and compilation are:</font>
* <font size="3">**`domain="finn"`**: specifies that the created node is a FINN-Custom Op</font>
* <font size="3">**`backend="fpgadataflow"`**: specifies that it is a node that corresponds to a function in the finn-hls library</font>
* <font size="3">**`code_gen_dir"`**: specifies the path to the directory where the generated c++ files are (is set during code generation)</font>
* <font size="3">**`executable_path"`**: specifies the path to the executable created after compilation (is set during compilation)</font>

In [5]:
FCLayer_node = helper.make_node(
        "StreamingFCLayer_Batch",
        node_inp_list,
        ["outp"],
        domain="finn",
        backend="fpgadataflow",
        code_gen_dir="",
        executable_path="",
        resType="ap_resource_lut()",
        MW=mw,
        MH=mh,
        SIMD=simd,
        PE=pe,
        noActivation=1,
        binaryXnorMode=1,
        inputDataType=idt.name,
        weightDataType=wdt.name,
        outputDataType=odt.name,
)

<font size="3"> The node is packed into a graph environment and the inputs and outputs are set.</font>

In [6]:
graph = helper.make_graph(
        nodes=[FCLayer_node], name="fclayer_graph", inputs=[inp], outputs=[outp]
    )

<font size="3">A model is now created from the graph, which is then converted into a ModelWrapper object for further processing in FINN. Afterwards the ModelWrapper internal functions can be used to set the FINN data types and the initializer for the weights. Since this is an example, the weights are not taken from the training, but random values are generated using the utility function `gen_finn_dt_tensor()`. This function gets a FINN datatype and a shape and generates a tensor with values of this datatype in the desired shape.</font>


In [7]:
model = helper.make_model(graph, producer_name="fclayer-model")
model = ModelWrapper(model)

model.set_tensor_datatype("inp", idt)
model.set_tensor_datatype("outp", odt)
model.set_tensor_datatype("weights", wdt)
W = util.gen_finn_dt_tensor(wdt, (mw, mh))
model.set_initializer("weights", W)


<font size="3">The model is saved and then netron is used to visualize the resulting model. </font>

In [8]:
model.save("FCLayer_graph.onnx")

In [9]:
import netron
netron.start('FCLayer_graph.onnx', port=8081, host="0.0.0.0")

Serving 'FCLayer_graph.onnx' at http://0.0.0.0:8081


In [10]:
%%html
<iframe src="http://0.0.0.0:8081/" style="position: relative; width: 100%;" height="400"></iframe>

### Code Generation
<font size="3">Code generation is a transformation pass that can be applied to the model. For more information about transformation passes, see Jupyter Notebook [FINN-HowToTransformPass](FINN-HowToTransformPass.ipynb).

The code generation transformation is shown below.</font>

In [11]:
from finn.transformation.fpgadataflow.codegen import CodeGen
showSrc(CodeGen)

class CodeGen(Transformation):
    """Code generation for all nodes in model"""

    def apply(self, model):
        for node in model.graph.node:
            if node.domain == "finn":
                backend_attribute = get_by_name(node.attribute, "backend")
                if backend_attribute is None:
                    continue
                backend_value = backend_attribute.s.decode("UTF-8")
                if backend_value == "fpgadataflow":
                    _codegen_single_node(node, model)
        return (model, False)



<font size="3">The transformation pass iterates over all nodes in the model and if `domain="finn"` and `backend="fpgadataflow"` is True, the function `_codegen_single_node()` is executed which is also part of the transformation pass and is shown below. </font>

In [12]:
from finn.transformation.fpgadataflow.codegen import _codegen_single_node
showSrc(_codegen_single_node)

def _codegen_single_node(node, model):
    """Call custom implementation to generate code for single custom node
    and create folder that contains all the generated files"""
    op_type = node.op_type
    try:
        # lookup op_type in registry of CustomOps
        inst = registry.custom_op[op_type](node)
        # get the path of the code generation directory
        code_gen_dir = inst.get_nodeattr("code_gen_dir")
        # ensure that there is a directory
        if code_gen_dir == "" or not os.path.isdir(code_gen_dir):
            code_gen_dir = tmp.mkdtemp(prefix="code_gen_" + str(node.op_type) + "_")
            inst.set_nodeattr("code_gen_dir", code_gen_dir)
        # ensure that there is generated code inside the dir
        inst.code_generation(model)
    except KeyError:
        # exception if op_type is not supported
        raise Exception("Custom op_type %s is currently not supported." % op_type)



<font size="3">An instance of the node is created and checked for the attribute `code_gen_dir`. If the attribute is not set, a temporary directory is created and the attribute is set accordingly. 

Then the `code_generation()` function of the instance is called. If an error occurs during this process, this is probably due to the fact that the selected CustomOp is not yet supported. The following description of the code generation within the CustomOp instance may lead to overlaps with the Jupyter notebook [FINN-CustomOps](FINN-CustomOps.ipynb).

In order to clarify the individual components involved in code generation, an instance of the node is first created, as in the `_codegen_single_node` function. This is done by looking up the op_type in the [registry](https://github.com/Xilinx/finn/blob/dev/src/finn/custom_op/registry.py) of CustomOps. The instance contains a template for code generation which is shown below.</font>

In [13]:
import finn.custom_op.registry as registry
node = FCLayer_node
op_type = FCLayer_node.op_type
inst = registry.custom_op[op_type](node)
print(inst.docompute_template)


        #include "cnpy.h"
        #include "npy2apintstream.hpp"
        #include <vector>
        #include "bnn-library.h"

        // includes for network parameters
        $GLOBALS$

        // defines for network parameters
        $DEFINES$

        int main(){

        $STREAMDECLARATIONS$

        $READNPYDATA$

        $DOCOMPUTE$

        $DATAOUTSTREAM$

        $SAVEASCNPY$

        }

        


<font size="3">The template has some general constructs, like the inclusion of bnn-library.h, which contains the references to the finn-hls library, and of cnpy.h and npy2apintstream.hpp, which support the transfer of python numpy arrays in c++. The idea of this template is to replace the variables marked with `$ $` with c++ calls during code generation. Then the template can be written into a .cpp file and be compiled. 

The sub-functions that are called during code generation are shown below.</font>

In [14]:
from finn.custom_op.fpgadataflow.streamingfclayer_batch import StreamingFCLayer_Batch
showSrc(StreamingFCLayer_Batch.code_generation)

    def code_generation(self, model):
        node = self.onnx_node
        self.generate_params(model)
        self.global_includes()
        self.defines()
        self.read_npy_data()
        self.strm_decl()
        self.docompute()
        self.dataoutstrm()
        self.save_as_npy()

        template = self.docompute_template

        for key in self.code_gen_dict:
            # transform list into long string separated by '\n'
            code_gen_line = "\n".join(self.code_gen_dict[key])
            template = template.replace(key, code_gen_line)
        code_gen_dir = self.get_nodeattr("code_gen_dir")
        f = open(os.path.join(code_gen_dir, "execute_{}.cpp".format(node.op_type)), "w")
        f.write(template)
        f.close()



<font size="3">Except for the function `generate_params(model)` all functions needed to fill the template correspond to the `$ $` variable names, i.e. function `defines()` returns the part of the c++ code that replaces `$DEFINES$` in the template. The individual functions are member functions of the class HLSCustomOp and are defined in each CustomOp. The code for a StreamingFCLayer_Batch node can be looked up in the [code](https://github.com/Xilinx/finn/blob/dev/src/finn/custom_op/fpgadataflow/streamingfclayer_batch.py).</font> 

<font size="3">A special function for code generation for the StreamingFCLayer_Batch node is the `generate_params(model)` function. Besides the normal input tensor, an fc layer has weight values as input and can get additional thresholds for activation. This function reads the values for the weights and thresholds via the `get_initializer` function of the ModelWrapper and writes them c++ conform in .h files, which are added to the includes. 

The `generate_params` function of the StreamingFCLayer_Batch is shown below.</font>

In [15]:
showSrc(StreamingFCLayer_Batch.generate_params)

    def generate_params(self, model):
        # weights
        weights = model.get_initializer(self.onnx_node.input[1])
        # convert weights into hlslib-compatible format
        weight_tensor = self.get_hls_compatible_weight_tensor(weights)
        export_wdt = self.get_weight_datatype()
        # we have converted bipolar weights to binary for export,
        # so use it as such for weight generation
        if self.get_weight_datatype() == DataType.BIPOLAR:
            export_wdt = DataType.BINARY
        weight_hls_code = numpy_to_hls_code(
            weight_tensor, export_wdt, "weights", True, True
        )
        # write weights into params.h
        code_gen_dir = self.get_nodeattr("code_gen_dir")
        f_weights = open("{}/params.h".format(code_gen_dir), "w")

        if export_wdt.bitwidth() != 1:
            f_weights.write(
                "static FixedPointWeights<{},{},{},{}> weights = ".format(
                    self.get_nodeattr("SIMD"),
                    

<font size="3">The generated code is written to the previously created temporary directory and the node attribute `code_gen_dir` is set. This completes the code generation for executing a single CustomOp. The next step is compilation. </font>

### Compilation

<font size="3">The compilation is a transformation pass like the code generation. The code of this transformation is shown below. </font>

In [16]:
from finn.transformation.fpgadataflow.compile import Compile
showSrc(Compile)

class Compile(Transformation):
    """Compile for all nodes in model"""

    def __init__(self):
        super().__init__()

    def apply(self, model):
        for node in model.graph.node:
            op_type = node.op_type
            if node.domain == "finn":
                backend_attribute = util.get_by_name(node.attribute, "backend")
                if backend_attribute is None:
                    continue
                backend_value = backend_attribute.s.decode("UTF-8")
                if backend_value == "fpgadataflow":
                    try:
                        # lookup op_type in registry of CustomOps
                        inst = registry.custom_op[op_type](node)
                        # ensure that code is generated
                        assert inst.get_nodeattr("code_gen_dir") != ""
                        # call the compilation function for this node
                        inst.compile_singlenode_code()
                        # ensure that executable path

<font size="3">The scheme resembles that of the code generation transformation pass. The pass iterates over all nodes in the model and if `domain="finn"` and `backend="fpgadataflow"` is True, the compilation is activated for that node. First an instance of the node is created and checked whether the code was generated. For this the node attribute `code_gen_dir` is checked. If it exists, the function `compile_singlenode_code()` can be executed. Then it is checked whether the path to the executable has been set. There is an exception if the custom op_type is not supported. 

The actual compilation is done with the function `compile_singlenode_code()`. What happens inside the function is shown below.</font>

In [17]:
showSrc(StreamingFCLayer_Batch.compile_singlenode_code)

    def compile_singlenode_code(self):
        code_gen_dir = self.get_nodeattr("code_gen_dir")
        builder = CppBuilder()
        builder.append_includes("-I/workspace/finn/src/finn/data/cpp")
        builder.append_includes("-I/workspace/cnpy/")
        builder.append_includes("-I/workspace/finn-hlslib")
        builder.append_includes("-I/workspace/vivado-hlslib")
        builder.append_includes("--std=c++11")
        builder.append_sources(code_gen_dir + "/*.cpp")
        builder.append_sources("/workspace/cnpy/cnpy.cpp")
        builder.append_includes("-lz")
        builder.set_executable_path(code_gen_dir + "/node_model")
        builder.build(code_gen_dir)
        self.set_nodeattr("executable_path", builder.executable_path)



<font size="3">To execute the compilation the class `CppBuilder` from `core.utils` is used. Subsequently the member functions of this class are used to construct the g++ command. To better understand the exact procedure the class `CppBuilder` is shown below. </font>

In [18]:
showSrc(util.CppBuilder)

class CppBuilder:
    def __init__(self):
        self.include_paths = []
        self.cpp_files = []
        self.executable_path = ""
        self.code_gen_dir = ""
        self.compile_components = []
        self.compile_script = ""

    def append_includes(self, library_path):
        self.include_paths.append(library_path)

    def append_sources(self, cpp_file):
        self.cpp_files.append(cpp_file)

    def set_executable_path(self, path):
        self.executable_path = path

    def build(self, code_gen_dir):
        # raise error if includes are empty
        self.code_gen_dir = code_gen_dir
        self.compile_components.append("g++ -o " + str(self.executable_path))
        for cpp_file in self.cpp_files:
            self.compile_components.append(cpp_file)
        for lib in self.include_paths:
            self.compile_components.append(lib)
        bash_compile = ""
        for component in self.compile_components:
            bash_compile += str(component) + " "
      

<font size="3">The class contains several member variables needed to execute the compilation command. These are reset when instantiating the class. The following functions are to fill these variables and in the build function, everything is combined into one compile command, which is then executed using the python library `subprocess`. 
    
After the executables have been created, the `compile_singlenode_code` function sets the `executable_path` node attribute.</font>

<font size="3">This flow is needed for the execution of a single CustomOp node. The execution itself is represented in function `execute_node` of the respective node class. The last part of this Jupyter notebook is about this function.</font>

### CustomOp node execution

<font size="3">The function `execute_node` of StreamingFCLayer_Batch is displayed below. The class HLSCustomOp also has an `execute_node` function, which contains the basic principle of the execution. However, for the StreamingFcLayer_Batch node further transformations are necessary. </font>

In [20]:
showSrc(StreamingFCLayer_Batch.execute_node)

    def execute_node(self, context, graph):
        node = self.onnx_node
        mw = self.get_nodeattr("MW")
        mh = self.get_nodeattr("MH")
        simd = self.get_nodeattr("SIMD")
        pe = self.get_nodeattr("PE")
        sf = mw // simd
        nf = mh // pe

        # TODO ensure codegen dir exists
        code_gen_dir = self.get_nodeattr("code_gen_dir")
        # create a npy file fore each input of the node (in_ind is input index)
        in_ind = 0
        for inputs in node.input:
            # it is assumed that the first input of the node is the data input
            # the second input are the weights
            # the third input are the thresholds
            if in_ind == 0:
                assert str(context[inputs].dtype) == "float32"
                expected_inp_shape = (1, sf, simd)
                reshaped_input = context[inputs].reshape(expected_inp_shape)
                # flip SIMD (innermost) dimension of input tensor, there's some reversal
             

<font size="3">First, all parameters are extracted using the `get_nodeattr` function. It is also important to read the code generation via `code_gen_dir`. `execute_node` is divided into three parts:</font>
* <font size="3">creation of a npy file for each input of the node</font>
* <font size="3">execution of the precompiled model</font>
* <font size="3">loading the output npy file</font>

#### Creation of a npy file for each input of the node

<font size="3">To transfer the input values correctly to the c++ model, the input tensor has to be reshaped and the innermost dimension (SIMD) has to be flipped. Afterwards the tensor can be stored in a .npy file. 

Since the StreamingFcLayer_Batch node only has a maximum of three inputs (input, weights, thresholds), an error will be thrown if this number is exceeded. The weights and thresholds have already been written to separate .h files and therefore only the input tensor has to be stored in a .npy file. </font>

#### Execution of the precompiled model
<font size="3">The function from class HLSCustomOp is used here. It is shown below.</font>

In [22]:
showSrc(StreamingFCLayer_Batch.exec_precompiled_singlenode_model)

    def exec_precompiled_singlenode_model(self):
        # execute precompiled executable
        executable_path = self.get_nodeattr("executable_path")
        if executable_path == "":
            raise Exception(
                """
Found no executable for this node, did you run the codegen and
compilation transformations?
            """
            )
        process_execute = subprocess.Popen(executable_path, stdout=subprocess.PIPE)
        process_execute.communicate()



<font size="3">After checking that the attribute `executable_path` is not empty, the executable is executed via `subprocess`. The output is written from the c++ code into a .npy file, which can be read later on. </font>

#### Loading the output npy file

<font size="3">To load the output data the function `npy_to_dynamic_output` is used. It is shown below. </font>

In [23]:
showSrc(StreamingFCLayer_Batch.npy_to_dynamic_output)

    def npy_to_dynamic_output(self, context):
        # TODO support multi-output nodes as needed
        node = self.onnx_node
        code_gen_dir = self.get_nodeattr("code_gen_dir")
        output = np.load("{}/output.npy".format(code_gen_dir))
        context[node.output[0]] = output



<font size="3">Since the output file is stored in the same directory as the generated code, the attribute `code_gen_dir` is read first and then the output data is loaded using the numpy function `.load`. The context is set accordingly. 

Finally, the output data is manipulated in the `execute_node` function. If the data is bipolar, it is converted into binary data for further processing. Then the shape of the tensor is checked and converted into the expected output shape.
</font>

### Conclusion

<font size="3">Code generation and compilation are transformation passes that must be applied before a node can be executed. They are independent of the execution of a node and can be used for further development to enable functions such as code generation for synthesis or larger models. 

All files belonging to the code generation and compilation are stored in the directory which is specified in `code_gen_dir`.

**Important**: If the code is executed inside the docker container, the directory will be deleted after closing the container. 
    
For further reading please see the /tests folder of the FINN repo. The subfolder /fpgadataflow contains for example: [test_fpgadataflow_fclayer](https://github.com/Xilinx/finn/blob/dev/tests/fpgadataflow/test_fpgadataflow_fclayer.py) which tests the functionality of the flow described above.</font>