# QBART: Minimal Viable Product Edition

*Welcome to QBART, the Quantized, Bitserial, AcceleRaThor!*

<img src="logo.png",width=400,height=400>

In this MVP-implementation, the QBART-team have prepared the following:
- Two layers run on the FPGA (the actual accelerator): thresholding and fully connected. A padding unit is also available for the several layers.
- All the other layers run on the Cortex A9s: pooling, convolution, sliding window.
- We utilize little to no BRAM on the FPGA, as most IO is saved directly to DRAM, and we have no custom memory hierarchy for the FPGA, so memory performance is suboptimal.
- We use the GTSRB-benchmark as the default in testing.

All in all, it might not accelerate anything at all, so the MVP is more a proof of concept, while future iterations will actually make this faster than the actual implementation. 

Alright, let's get to it!

Requirements:
- A trained QNN that is pickled and formatted similarly as the GTSRB benchmark.
- This must be placed on the PYNQ, and you must edit the QNN path below so that QBART can find and work on it.
- Image(s) must also be placed in a seperate folders, and you must set the image path accordingly.

Alright, with the requirements done, we do the following:
1. Run all image classifications on QBART, and time it.
2. Run all image classifications on a pure, correct CPU implementation, and time it.
3. Check if both QBART and the CPU implementation agree. If both implementations agree on all image classifications, we know that the QBART implementation is correct.
4. Present the results to the user.

TODO(N35N0M): We should consider a parameter for choosing how many of the images we want to run. The GTSRB test set is quite large, and if running all of the images takes hours, then we should only run a small subset for implementation correctness testing.

# Step 1: Running all image classifications on QBART

In [16]:
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import sys
from QNN import *
import pickle
import os
from time import sleep

# CREDIT: Parts of the code are simply extracted from the tutorial code provided by the course instructors.
# Should be available at: https://github.com/maltanar/qnn-inference-examples

qnn_path = "gtsrb-w1a1.pickle" # This must be submitted by the user.
image_dir = "gtsrb_images"  # This must be submitted by the user, relative to home folder.

# Loads all the images from the folder provided by the user, 
# returns a python list of references to those images loaded as PIL images.
# Should be placed in a separate function file when debug is finished.
def load_images(image_dir):
    # If the image directory is empty, then we return a sentinel value to indicate an error.
    if image_dir is None:
        print("No image directory specified.")
        return -1
    
    # Else we simply read in files
    images = []
    
    print("Starts to read files")
    files = os.listdir(image_dir)
    
    for image in files:
        # listdir only returns file name, not the relative path. So the following doesn't look that nice.
        name = "".join((image_dir, "/", image))
        
        img = Image.open(name) 
        img = np.asarray(img)
        images.append(img)
        
                
    return images
                

print("This folder currently has the following images: " + str(os.listdir(image_dir)))

# TODO: Add error handling in the case of -1. Maybe the function should just raise a fatal error and halt the program?
images = load_images(image_dir)
print(images)

plt.imshow(images[1], cmap='gray')
sleep(1)
    
# Loads the QNN from a python2 pickle file, and returns a list of the layers, identical to the tutorial code.
def load_qnn(qnn_filepath):
    if qnn_filepath is None:
        return -1
    # load the qnn
    qnn = pickle.load(open(qnn_filepath, "rb"))
    return qnn

qnn = load_qnn(qnn_path)
print(qnn)

# TODO: Add distribution of images if Beowulf cluster 
# is to be implemented (utilizing several PYNQs to exploit some DLP (working on entire images)).
# The work to be done should be load balanced between all PYNQs.
#
# Some quantitiative measuring should be done on the implementation, to see if image transfer overhead will
# overweigh the DLP benefits.

#def beowulf_distribute():
    # TODO: Define something here.
    

# In our MVP implementation, we do not support inferencing several images at once on the same FPGA,
# so inference is simply a matter of going through all the given images, one at a time. Layer for layer,
# saving the final classification for each image. 
#
# Basically the predict function from "layers.py", but will be heavily modified.

qbart_classifications = []

for image in images:
    activations = image
    
    for layer in qnn:
        # Each layer will either do calculations on the A9 or the FPGA.
        # Everything that the FPGA is unable to do, simply runs on the CPU.
        #
        # It will initially look very similar to alot in the provided "layers.py", but should in the end be entirely
        # different when FPGA implements are finished.
        #
        # If there is no FPGA-implement, we simply reuse the provided sw-implement code by calling the "execute"
        # method for each layer object.
        
        # CONVOLUTION LAYER
        if (layer.layerType() == "QNNConvolutionLayer"):
            activations = layer.execute(activations)
            
        # FULLY CONNECTED LAYER
        elif (layer.layerType() == "QNNFullyConnectedLayer"):
            activations = layer.execute(activations)
            
        # POOLING LAYER
        elif (layer.layerType() == "QNNPoolingLayer"):
            activations = layer.execute(activations)
            
        # THRESHOLDING LAYER
        elif (layer.layerType() == "QNNThresholdingLayer"):
            activations = layer.execute(activations)
        
        # SCALESHIFT LAYER
        elif (layer.layerType() == "QNNScaleShiftLayer"):
            activations = layer.execute(activations)
            
        # PADDING LAYER
        elif (layer.layerType() == "QNNPaddingLayer"):
            activations = layer.execute(activations)
            
        # SLIDING WINDOW LAYER
        elif (layer.layerType() == "QNNSlidingWindowLayer"):
            activations = layer.execute(activations)
            
        # LINEAR LAYER
        elif (layer.layerType() == "QNNLinearLayer"):
            activations = layer.execute(activations)
            
        # SOFTMAX LAYER
        elif (layer.layerType() == "QNNSoftmaxLayer"):
            activations = layer.execute(activations)
            
        # ReLU LAYER
        elif (layer.layerType() == "QNNReLULayer"):
            activations = layer.execute(activations)
            
        # BIPOLAR THRESHOLDING LAYER (Can't this be replaced by the general thresholding layer?)
        elif (layer.layerType() == "QNNBipolarThresholdingLayer"):
            activations = layer.execute(activations)
            
        else:
            # Raise error, we are asked to perform a layer operation we do not know.
            raise ValueError("Invalid layer type.")
    
    # After all the layers have been performed, the final classification should be extracted and saved.
    # Preferably as a tuple with (image name, classification), to clearly label output data.
    qbart_classifications.append(np.argmax(activations))
    
# After all the images have been classified, we collect the results from the rest of the Beowulf cluster, if it exists.

# Then, when all results are done, we move on to the CPU implementation.

This folder currently has the following images: ['left.jpg', 'stop.jpg', 'right.jpg', '50.jpg']
Starts to read files
[array([[[255, 255, 253],
        [255, 255, 253],
        [255, 255, 255],
        ..., 
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 253],
        [255, 255, 253],
        [255, 255, 255],
        ..., 
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 251],
        [255, 255, 253],
        [255, 255, 255],
        ..., 
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       ..., 
       [[157, 168, 200],
        [136, 147, 179],
        [143, 156, 190],
        ..., 
        [171, 184, 229],
        [145, 158, 200],
        [130, 139, 178]],

       [[147, 159, 185],
        [140, 152, 178],
        [160, 174, 201],
        ..., 
        [160, 172, 222],
        [143, 155, 203],
        [132, 140, 186]],

       [[139, 151, 175],
        [144, 

ValueError: total size of new array must be unchanged

# Step 2: Running all image classifications on a CPU implementation 

# Step 3: Implementation correctness testing

# Step 4: Presentation of results