# Initialize the accelerator

In [None]:
from finn_examples import models
print(list(filter(lambda x: "unsw_nb15" in x, dir(models))))

Specify a batch size & create the FINN overlay. Note that the batch size must divide 82000.

In [None]:
batch_size = 1
accel = models.mlp_w2a2_unsw_nb15()

In [None]:
print("Expected input shape and datatype: %s %s" % (str(accel.ishape_normal()), str(accel.idt())))
print("Expected output shape and datatype: %s %s" % (str(accel.oshape_normal()), str(accel.odt())))

# Load the binarized UNSW-NB15 test dataset

In [None]:
! wget -nc -O unsw_nb15_binarized.npz https://zenodo.org/record/4519767/files/unsw_nb15_binarized.npz?download=1

Note that the generated design expects inputs of length 600. As explained in the [end-to-end notebook](https://github.com/Xilinx/finn/blob/main/notebooks/end2end_example/cybersecurity/1-train-mlp-with-brevitas.ipynb) in the FINN repository, padding the input data from length 593 to 600 enables SIMD parallelization for the first layer.
Thus, we'll have to pad our dataset before feeding it to the accelerator.

In [None]:
import numpy as np

def make_unsw_nb15_test_batches(bsize):
    unsw_nb15_data = np.load("unsw_nb15_binarized.npz")["test"][:82000]
    test_imgs = unsw_nb15_data[:, :-1]
    test_imgs = np.pad(test_imgs, [(0, 0), [0, 7]], mode="constant")
    test_labels = unsw_nb15_data[:, -1]
    n_batches = int(test_imgs.shape[0] / bsize)
    test_imgs = test_imgs.reshape(n_batches, bsize, -1)
    test_labels = test_labels.reshape(n_batches, bsize)
    return (test_imgs, test_labels)

# Classify a single attack

In [None]:
(test_imgs, test_labels) = make_unsw_nb15_test_batches(bsize=1)

In [None]:
test_single = test_imgs[-1]
test_single_label = test_labels[-1].astype(np.float32)

print("Expected label is: %d (%s data)" % (test_single_label, (lambda x: "normal" if x==0 else "abnormal")(test_single_label)))

In [None]:
# Note: the accelerator expects binary input data presented in bipolar form (i.e. {-1, 1})
accel_in = 2 * test_single - 1
accel_out = accel.execute(accel_in)
# To convert back to the original label (i.e. {0, 1}), we'll have to map the bipolar output to binary
accel_out_binary = (accel_out + 1) / 2

In [None]:
print("Returned label is: %d (%s data)" % (accel_out_binary, (lambda x: "normal" if x==0 else "abnormal")(accel_out_binary)))

# Validate accuracy on 82000 (out of 82332) records from UNSW-NB15 test set

To increase the throughput, let's increase the batch size. Note that the FINN accelerator operates on a batch size of 1, but to fill the compute pipeline, we'll copy a greater chunk of the test set to the device buffer.

In [None]:
batch_size = 1000
accel.batch_size = batch_size
(test_imgs, test_labels) = make_unsw_nb15_test_batches(batch_size)

In [None]:
ok = 0
nok = 0
n_batches = test_imgs.shape[0]
total = batch_size*n_batches

In [None]:
for i in range(n_batches):
    inp = test_imgs[i].astype(np.float32)
    exp = test_labels[i].astype(np.float32)
    inp = 2 * inp - 1
    exp = 2 * exp - 1
    out = accel.execute(inp)
    matches = np.count_nonzero(out.flatten() == exp.flatten())
    nok += batch_size - matches
    ok += matches
    print("batch %d / %d : total OK %d NOK %d" % (i + 1, n_batches, ok, nok))

In [None]:
acc = 100.0 * ok / (total)
print("Final accuracy: {:.2f}%".format(acc))

In [None]:
def run_validation():
    for i in range(n_batches):
        ibuf_normal = test_imgs[i].reshape(accel.ishape_normal())
        accel.execute(ibuf_normal)

In [None]:
full_validation_time = %timeit -n 1 -o run_validation()

In [None]:
print("%f images per second including data movement" % (total / float(full_validation_time.best)))

# More benchmarking

In [None]:
accel.throughput_test()

The measured `throughput` of the accelerator, excluding any software and data movement overhead, is influenced by the batch size. The more we fill the compute pipeline, the higher the throughput.
Note that the total runtime consists of the overhead of packing/unpacking the inputs/outputs to convert form numpy arrays to the bit-contiguous data representation our accelerator expectes (`pack_input`/`unpack_output`), the cost of moving data between the CPU and accelerator memories (`copy_input_data_to_device`/`copy_output_data_from_device`), as well as the accelerator's execution time.