# Model Layer Module

Let us now look at the `Layer` and `Module` classes instead. `Layer` corresponds to a node in onnx, and when it is implemented on the hardware, it is decomposed into multiple `Module`s. 

In [1]:
from fpgaconvnet.parser.Parser import Parser

onnx_path = "../3.1_model_onnx_parser/fp16/vgg16_bn.onnx"
parser = Parser(custom_onnx=True, batch_size=1)
net = parser.onnx_to_fpgaconvnet(onnx_path)
conv_0_layer = net.partitions[0].graph.nodes["Conv_0"]["hw"]
conv_0_layer.modules



OrderedDict([('pad',
              Pad(rows=32, cols=32, channels=3, data_width=16, rsc_coef={'FF': [], 'LUT': [], 'DSP': [], 'BRAM': []}, pad_top=1, pad_bottom=1, pad_left=1, pad_right=1, backend='chisel', regression_model='linear_regression', streams=1, latency_mode=False, block=False)),
             ('sliding_window',
              SlidingWindow(rows=34, cols=34, channels=3, data_width=16, rsc_coef={'Logic_LUT': array([6.46840521, 0.        , 0.        , 3.92659388, 2.45084112]), 'LUT_RAM': array([ 0.07302216,  0.        , 14.23619095]), 'LUT_SR': array([0.]), 'FF': array([ 0.        ,  0.        , 20.77153538,  0.        ,  0.        ,
                      0.        ,  5.21236925,  3.44020551,  0.        ,  0.        ]), 'DSP': array([0.]), 'BRAM36': array([0.]), 'BRAM18': array([0.])}, kernel_size=[3, 3], stride=[1, 1], pad_top=0, pad_right=0, pad_bottom=0, pad_left=0, backend='chisel', regression_model='linear_regression', streams=1)),
             ('squeeze',
              Sque

Every layer has a set of tunable design parameters that control the configuration in its modules, by trading resources for improved performance. For example, in a convolutional layer, fpgaConvNet provides `coarse_in`, `coarse_out`, `coarse_group` and `fine` which are design parameters related to the parallelism on the input channel, output channel, groups and kernel size dimensions respectively.

In [2]:
print("coarse_in: ", conv_0_layer.coarse_in)
print("coarse_out: ", conv_0_layer.coarse_out)
print("coarse_group: ", conv_0_layer.coarse_group)
print("fine: ", conv_0_layer.fine)

coarse_in:  1
coarse_out:  1
coarse_group:  1
fine:  1


For now, everything happening within the layer is sequential, so their values are all one. To examine the latency and resource utilization of current `conv_0_layer`, we can do this by calling [`resource()`](https://github.com/AlexMontgomerie/fpgaconvnet-model/blob/dev-petros/fpgaconvnet/models/layers/ConvolutionLayer.py#L590) and [`latency()`](https://github.com/AlexMontgomerie/fpgaconvnet-model/blob/dev-petros/fpgaconvnet/models/layers/Layer.py#L305) functions.

In [3]:
print("Latency (cycle):", conv_0_layer.latency())
print(conv_0_layer.resource())

Latency (cycle): 1769472
{'LUT': 4206, 'FF': 2108, 'DSP': 1, 'BRAM': 4, 'URAM': 0}


Regarding the resource prediction, fpgaConvNet uses linear regression to estimate resources at module level. Specifically, each module has the attribute `rsc_coef` that was generated beforehand by running a set of designs through synthesis, and we define the variables that have impact on resources in the `utilisation_model` function. The final resource estimation is obtained by calling the `rsc` function of individual modules (https://github.com/AlexMontgomerie/fpgaconvnet-model/blob/dev-petros/fpgaconvnet/models/modules/Module.py#L111). For example

In [4]:
accum_module = conv_0_layer.modules["accum"]
print("rsc_coef: ", accum_module.rsc_coef)
print("utilisation_model: ", accum_module.utilisation_model())
print("accum rsc: ",accum_module.rsc())

rsc_coef:  {'Logic_LUT': array([0.85353288, 0.        , 3.51035984, 0.        , 0.        ,
       0.        , 0.        ]), 'LUT_RAM': array([13.67379136,  0.01678309,  0.92521122,  0.        ]), 'LUT_SR': array([0.]), 'FF': array([0.        , 3.20979382, 0.        , 0.        , 0.00732597,
       0.28885062, 0.        ]), 'DSP': array([0.]), 'BRAM36': array([0.]), 'BRAM18': array([0.])}
utilisation_model:  {'Logic_LUT': array([64, 27, 32,  1,  5,  6,  1]), 'LUT_RAM': array([   3, 2048,   32,   64]), 'LUT_SR': array([0]), 'FF': array([32, 32,  5,  6, 27, 64,  1]), 'DSP': array([0]), 'BRAM36': array([0]), 'BRAM18': array([0])}
accum rsc:  {'Logic_LUT': 166, 'LUT_RAM': 104, 'LUT_SR': 0, 'FF': 121, 'DSP': 0, 'BRAM36': 0, 'BRAM18': 0, 'LUT': 270, 'BRAM': 0}


We can also modify the design paramters of each layer to trade resources for better performance. For example, we can maximize the parallelim of the convolutional layer by running:

In [5]:
conv_0_layer.coarse_in = conv_0_layer.get_coarse_in_feasible()[-1]
conv_0_layer.coarse_out = conv_0_layer.get_coarse_out_feasible()[-1]
conv_0_layer.coarse_group = conv_0_layer.get_coarse_group_feasible()[-1]
conv_0_layer.fine = conv_0_layer.get_fine_feasible()[-1]

conv_0_layer.update()

print("coarse_in: ", conv_0_layer.coarse_in)
print("coarse_out: ", conv_0_layer.coarse_out)
print("coarse_group: ", conv_0_layer.coarse_group)
print("fine: ", conv_0_layer.fine)

print("Latency (cycle):", conv_0_layer.latency())
print(conv_0_layer.resource())

coarse_in:  3
coarse_out:  64
coarse_group:  1
fine:  9
Latency (cycle): 1156
{'LUT': 66973, 'FF': 94988, 'DSP': 1728, 'BRAM': 828, 'URAM': 0}


Notice that the `update` function is to propogate the change from layer to module level.