# Device Setup

## Import Libraries

The FINNExampleOverlay handles data transfers between the processing system (controlled directly by this notebook) and the programmable logic. It packs the input data into a format the models will expect, then sends and receives data from the programmable logic via DMAs.<br>
get_data contains functions to load test data from .npz files and format it in the required buffer lengths.<br>
Numpy is used for data handling. Platform and pynq contain information about the device being used.

In [1]:
from driver_base import FINNExampleOverlay
import get_data
import numpy as np
import platform
import pynq

## Get Device Information

Some extra information about the device, required for the overlay.

In [2]:
def get_edge_or_pcie():
    cpu = platform.processor()
    if cpu in ["armv7l", "aarch64"]:
        return "edge"
    elif cpu in ["x86_64"]:
        return "pcie"
    else:
        raise OSError("Platform is not supported.")

In [3]:
driver_modes = {"edge": "zynq-iodma", "pcie": "alveo"}
target_platform = pynq.Device.active_device.name
driver_mode = driver_modes[get_edge_or_pcie()]

# Run Multi-Layer Perceptron (MLP)

## Get Test Data

The MLP expects a buffer size of 256 and the expected labels are binary values for whether four channels are occupied or not. The data is interleaved so that the iq data is structured in an alternating pattern like [i,q,i,q,i,q,...]. Due to limits with the PYNQ-Z2s storage only 176 MB of data is tested with apposed to the available 40 GB, but this is still 42900 test cases. 

In [4]:
normalized_array, labels = get_data.get_mlp_data()

In [5]:
print(normalized_array.shape)
print(labels.shape)

(42900, 1, 512)
(42900, 4)


## Initialise the MLP

The bitfile generated in Vivado contains all the information needed to create the MLP in the programmable logic. The io_shape_dict contains any information about how the data should be formatted for inputs and outputs. The overlay uses the bitfile to instanciate the model in hardware, setting up connections with the DMAs to pass data to it. The model can receive sets of data at up to 100MHz, with an output being produced every 16 sets of input.

In [6]:
bitfile = "bitfile_final_mlp/finn-accel.bit"
from driver_final_mlp.driver import io_shape_dict
mlp = FINNExampleOverlay(bitfile, driver_mode, io_shape_dict,fclk_mhz = 100.0)
print(io_shape_dict)

{'idt': [INT8], 'odt': [BIPOLAR], 'ishape_normal': [(1, 512)], 'oshape_normal': [(1, 4)], 'ishape_folded': [(1, 32, 16)], 'oshape_folded': [(1, 4, 1)], 'ishape_packed': [(1, 32, 16)], 'oshape_packed': [(1, 4, 1)], 'input_dma_name': ['idma0'], 'output_dma_name': ['odma0'], 'number_of_external_weights': 0, 'num_inputs': 1, 'num_outputs': 1}


## Test with a Single Input

To test the model receiving and predicting with data a single input is sent to the MLP. The input data is convereted to 8 bit integers as this is what it will receive from the rest of the radio.

In [7]:
test_single = normalized_array[1].astype(np.int8)
test_single_label = labels[1]

print(test_single.shape)
print(test_single_label)

(1, 512)
[0. 0. 0. 1.]


The overlay provides the execute function to format and send data to the programmable logic. It then waits till it receives an output and returns it.

In [8]:
mlp.batch_size = 1
accel_out = mlp.execute(test_single)

Since the model output is a bipolar rather than binary it uses -1 to represent no user in a channel rather than 0 like with the labels.

In [9]:
print(accel_out)

[[-1. -1. -1. -1.]]


## Test with All Data

To verify the model is functioning as expected the full available dataset is executed on it. The bipolar outputs are converted to binary and compared with the labels to get the model accuracy.

In [10]:
#split the input data into 20 equally sized batches
batch_size = 2145
mlp.batch_size = batch_size
(test_imgs, test_labels) = (normalized_array.astype(np.int8),labels)
total = test_imgs.shape[0]
n_batches = total//batch_size
test_imgs = test_imgs.reshape(n_batches,batch_size,512)
test_labels = test_labels.reshape(n_batches,batch_size,4)

#loop through the batches and execute on the MLP
ok = 0
nok = 0
for i in range(n_batches):
    inp = test_imgs[i]
    exp = test_labels[i].astype(np.float32)
    out = mlp.execute(inp)
    out = [(x+1)/2 for x in out.flatten()]
    ok += np.count_nonzero(out == exp.flatten())
    nok += np.count_nonzero(out != exp.flatten())
    acc = 100.0 * ok / (ok+nok)
    print("batch %d / %d : total OK %d NOK %d : accuracy %.2f%%" % (i + 1, n_batches, ok, nok,acc))
    
acc = 100.0 * ok / (ok+nok)
print("Final accuracy: {:.2f}%".format(acc))

batch 1 / 20 : total OK 6753 NOK 1827 : accuracy 78.71%
batch 2 / 20 : total OK 13413 NOK 3747 : accuracy 78.16%
batch 3 / 20 : total OK 19829 NOK 5911 : accuracy 77.04%
batch 4 / 20 : total OK 26688 NOK 7632 : accuracy 77.76%
batch 5 / 20 : total OK 34326 NOK 8574 : accuracy 80.01%
batch 6 / 20 : total OK 42152 NOK 9328 : accuracy 81.88%
batch 7 / 20 : total OK 50152 NOK 9908 : accuracy 83.50%
batch 8 / 20 : total OK 57316 NOK 11324 : accuracy 83.50%
batch 9 / 20 : total OK 64179 NOK 13041 : accuracy 83.11%
batch 10 / 20 : total OK 69848 NOK 15952 : accuracy 81.41%
batch 11 / 20 : total OK 75736 NOK 18644 : accuracy 80.25%
batch 12 / 20 : total OK 83730 NOK 19230 : accuracy 81.32%
batch 13 / 20 : total OK 91590 NOK 19950 : accuracy 82.11%
batch 14 / 20 : total OK 99262 NOK 20858 : accuracy 82.64%
batch 15 / 20 : total OK 105000 NOK 23700 : accuracy 81.59%
batch 16 / 20 : total OK 108335 NOK 28945 : accuracy 78.92%
batch 17 / 20 : total OK 114630 NOK 31230 : accuracy 78.59%
batch 18 / 

In [11]:
mlp.throughput_test()

{'runtime[ms]': 2.2706985473632812,
 'throughput[images/s]': 944643.2255354893,
 'DRAM_in_bandwidth[MB/s]': 483.65733147417046,
 'DRAM_out_bandwidth[MB/s]': 3.7785729021419567,
 'fclk[mhz]': 100.0,
 'batch_size': 2145,
 'fold_input[ms]': 0.11682510375976562,
 'pack_input[ms]': 0.10251998901367188,
 'copy_input_data_to_device[ms]': 6.504535675048828,
 'copy_output_data_from_device[ms]': 0.29277801513671875,
 'unpack_output[ms]': 3247.3292350769043,
 'unfold_output[ms]': 0.07557868957519531}

# Run Convolutional Neural Network (CNN)

## Get Test Data

The CNN expects a buffer size of 2 by 16 by 16 and the expected labels are binary values for whether four channels are occupied or not. The iq samples are kept seperate for the CNN, with each being input as 16 by 16 array. Due to limits with the PYNQ-Z2s storage only 176 MB of data is tested with apposed to the available 40GB, but this is still 42900 test cases. 

In [12]:
normalized_array, labels = get_data.get_cnn_data()

In [13]:
print(normalized_array.shape)
print(labels.shape)

(42900, 1, 16, 16, 2)
(42900, 4)


## Initialise the CNN in Hardware

The bitfile generated in Vivado contains all the information needed to create the MLP in the programmable logic. The io_shape_dict contains any information about how the data should be formatted for inputs and outputs. The overlay uses the bitfile to instanciate the model in hardware, setting up connections with the DMAs to pass data to it. The model can receive sets of data at up to 100MHz, with an output being produced every 16 sets of input.

In [14]:
bitfile = "bitfile_final_cnn/finn-accel.bit"
from driver_final_cnn.driver import io_shape_dict
cnn = FINNExampleOverlay(bitfile, driver_mode, io_shape_dict,fclk_mhz = 100.0)

## Test with a Single Input

To test the model receiving and predicting with data a single input is sent to the CNN. The input data is convereted to 8 bit integers as this is what it will receive from the rest of the radio.

In [15]:
test_single = normalized_array[1].astype(np.int8)
test_single_label = labels[1]

print(test_single.shape)
print(test_single_label)


(1, 16, 16, 2)
[0. 0. 0. 1.]


The overlay provides the execute function to format and send data to the programmable logic. It then waits till it receives an output and returns it.

In [16]:
cnn.batch_size = 1
accel_out = cnn.execute(test_single)

Since the model output is a bipolar rather than binary it uses -1 to represent no user in a channel rather than 0 like with the labels.

In [17]:
print(accel_out)

[[-1. -1. -1.  1.]]


## Test with All Data

To verify the model is functioning as expected the full available dataset is executed on it. The bipolar outputs are converted to binary and compared with the labels to get the model accuracy.

In [18]:
batch_size = 2145
cnn.batch_size = batch_size
(test_imgs, test_labels) = (normalized_array.astype(np.int8),labels)

ok = 0
nok = 0
n_batches = test_imgs.shape[0]//batch_size
total = batch_size*n_batches

test_imgs = test_imgs.reshape(n_batches,batch_size,16,16,2)
test_labels = test_labels.reshape(n_batches,batch_size,4)

for i in range(n_batches):
    inp = test_imgs[i]
    exp = test_labels[i].astype(np.float32)
    out = cnn.execute(inp)
    out = [(x+1)/2 for x in out.flatten()]
    ok += np.count_nonzero(out == exp.flatten())
    nok += np.count_nonzero(out != exp.flatten())
    acc = 100.0 * ok / (ok+nok)
    print("batch %d / %d : total OK %d NOK %d : accuracy %.2f%%" % (i + 1, n_batches, ok, nok,acc))

acc = 100.0 * ok / (ok+nok)
print("Final accuracy: {:.2f}%".format(acc))

batch 1 / 20 : total OK 6690 NOK 1890 : accuracy 77.97%
batch 2 / 20 : total OK 13476 NOK 3684 : accuracy 78.53%
batch 3 / 20 : total OK 20583 NOK 5157 : accuracy 79.97%
batch 4 / 20 : total OK 28104 NOK 6216 : accuracy 81.89%
batch 5 / 20 : total OK 36336 NOK 6564 : accuracy 84.70%
batch 6 / 20 : total OK 44711 NOK 6769 : accuracy 86.85%
batch 7 / 20 : total OK 53196 NOK 6864 : accuracy 88.57%
batch 8 / 20 : total OK 59916 NOK 8724 : accuracy 87.29%
batch 9 / 20 : total OK 65984 NOK 11236 : accuracy 85.45%
batch 10 / 20 : total OK 71712 NOK 14088 : accuracy 83.58%
batch 11 / 20 : total OK 77596 NOK 16784 : accuracy 82.22%
batch 12 / 20 : total OK 86105 NOK 16855 : accuracy 83.63%
batch 13 / 20 : total OK 94550 NOK 16990 : accuracy 84.77%
batch 14 / 20 : total OK 102823 NOK 17297 : accuracy 85.60%
batch 15 / 20 : total OK 109039 NOK 19661 : accuracy 84.72%
batch 16 / 20 : total OK 112756 NOK 24524 : accuracy 82.14%
batch 17 / 20 : total OK 119507 NOK 26353 : accuracy 81.93%
batch 18 / 

In [19]:
cnn.throughput_test()

{'runtime[ms]': 10.96200942993164,
 'throughput[images/s]': 195675.80321023098,
 'DRAM_in_bandwidth[MB/s]': 100.18601124363825,
 'DRAM_out_bandwidth[MB/s]': 0.7827032128409238,
 'fclk[mhz]': 100.0,
 'batch_size': 2145,
 'fold_input[ms]': 0.1163482666015625,
 'pack_input[ms]': 0.09512901306152344,
 'copy_input_data_to_device[ms]': 6.750822067260742,
 'copy_output_data_from_device[ms]': 0.3414154052734375,
 'unpack_output[ms]': 3181.0953617095947,
 'unfold_output[ms]': 0.08153915405273438}