# QBART: Minimal Viable Product Edition

*Welcome to QBART, the Quantized, Bitserial, AcceleRaThor!*

<img src="logo.png",width=400,height=400>

In this MVP-implementation, the QBART-team have prepared the following:
- Two layers run on the FPGA (the actual accelerator): thresholding and fully connected. A padding unit is also available for the several layers.
- All the other layers run on the Cortex A9s: pooling, convolution, sliding window.
- We utilize little to no BRAM on the FPGA, as most IO is saved directly to DRAM, and we have no custom memory hierarchy for the FPGA, so memory performance is suboptimal.
- We use the GTSRB-benchmark as the default in testing.

All in all, it might not accelerate anything at all, so the MVP is more a proof of concept, while future iterations will actually make this faster than the actual implementation. 

Alright, let's get to it!

Requirements:
- A trained QNN that is pickled and formatted similarly as the GTSRB benchmark.
- This must be placed on the PYNQ, and you must edit the QNN path below so that QBART can find and work on it.
- Image(s) must also be placed in a seperate folders, and you must set the image path accordingly.

Alright, with the requirements done, we do the following:
1. Run all image classifications on QBART, and time it.
2. Run all image classifications on a pure, correct CPU implementation, and time it.
3. Check if both QBART and the CPU implementation agree. If both implementations agree on all image classifications, we know that the QBART implementation is correct.
4. Present the results to the user.

TODO(N35N0M): We should consider a parameter for choosing how many of the images we want to run. The GTSRB test set, for example, is quite large, and if running all of the images takes hours (figuratively or literally), then we should only run a small subset for implementation correctness testing. Update: With tutorial code, it takes 3-5 hours to run the entire test set. Yikes.

TODO(N35N0M): Generalize the notebook more, maybe with a config or similar, so that we are provided a QNN expected layout, image layout, etc, and then we can transform the data into the expected format for the QNN and work on it. Now the code is very specialized toward handling GTSRB images. It's okay for MVP, but not for final product.

# Step 1: Running all image classifications on QBART

In [None]:
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import sys
from QNN import *
import pickle
import os
from time import sleep, time
import copy
import logging

# Console handler for logging, courtesy of pythondocs.
# Creds: https://stackoverflow.com/questions/13733552/logger-configuration-to-log-to-file-and-print-to-stdout
logFormatter = logging.Formatter("%(asctime)s [%(threadName)-12.12s] [%(levelname)-5.5s]  %(message)s")
rootLogger = logging.getLogger()
rootLogger.setLevel(logging.INFO)

fileHandler = logging.FileHandler("{0}/{1}.log".format("", "MVP_QBART_LOG"))
fileHandler.setFormatter(logFormatter)
rootLogger.addHandler(fileHandler)

consoleHandler = logging.StreamHandler()
consoleHandler.setFormatter(logFormatter)
rootLogger.addHandler(consoleHandler)

# This, for example, should be an input parameter, so it becomes the user's responsibility to provide categories.
# If none is provided, then one simply can use "category 0, 1, 2, " etc.
gtsrb_classes = ["speed limit, 20", "speed limit, 30", "speed limit, 50", "speed limit, 60", "speed limit, 70", "speed limit, 80", "speed limit, slash 80", "speed limit, 100", "speed limit, 120","no forbikjøring allowed", "no lastebilforkjøring allowed", "intersection ahead", "forkjørsrett", "vikeplikt", "stop", "no vehicles allowed?", "vans only?","innkjøring forbudt", "warning sign !","sharp left turn", "sharp right turn", "two sharp turns ahead", "road bumps ahead", "slippery roads ahead", "merge ahead?", "construction work ahead", "traffic light ahead","pedestrians ahead", "school crossing/children playing ahead", "bicyclists ahead", "snow ahead", "deer warning", "national speed limit applies", "must turn right", "must turn left", "must drive straight forward", "must drive either forward or turn right","must drive either forward, or turn left", "must drive on the right side of this sign", "must drive on the left side of this sign", "roundabout", "Forbikjøring is allowed again", "vans may forbikjøre again"]

# CREDIT: Parts of the code are simply extracted from the tutorial code provided by the course instructors.
# Should be available at: https://github.com/maltanar/qnn-inference-examples

qnn_path = "gtsrb-w1a1.pickle"         # This must be submitted by the user.
image_dir = "GTSRB/Final_Test/Images"  # This must be submitted by the user, relative to home folder.


# Loads all the images from the folder provided by the user, 
# returns a python list of references to those images loaded as PIL images.
def load_images(image_dir):
    # If the image directory is empty, then we return a sentinel value to indicate an error.
    if image_dir is None:
        logging.error("There are no files in the specified image folder.")
        return -1
    
    # TODO: Current implementation does not distinguish between image files and others.
    # A solution is to see which image formats pillow fully supports, and then only support these, throwing
    # a warning to logger for the rest.
    
    
    images = []
    
    logging.info("Starts to load files with pillow, converting them into 1D numpy arrays.")
    print("Y u no print logger")
    files = os.listdir(image_dir)
    
    for image in files:
        # listdir only returns file name, not the relative path. So the following doesn't look that nice.
        name = "".join((image_dir, "/", image))
        
        img = Image.open(name)
        # Resize to expected img dimensions (from tutorial code)
        img = img.resize((32, 32))
        img = np.asarray(img)
        # Rearrange the data layout to channels, rows, columns (from tutorial code)
        img = img.transpose((2, 0, 1))
        # Use BGR instead of RGB, since the provided network is like that. (from tutorial code)
        img = img[::-1, :, :]

        images.append(copy.copy(img))
        
    logging.info("We are now finished with loading images.")            
    return images
                

logging.info("This folder currently has the following images: " + str(os.listdir(image_dir)))

# TODO: Add error handling in the case of -1. Maybe the function should just raise a fatal error and halt the program?
images = load_images(image_dir)

    
# Loads the QNN from a python2 pickle file, and returns a list of the layers, similar to the tutorial code.
def load_qnn(qnn_filepath):
    
    logging.info("Currently trying to load provided QNN.")
    if qnn_filepath is None:
        logging.error("There isn't a file in the specified QNN-location.")
        return -1
    
    # load the qnn
    qnn = pickle.load(open(qnn_filepath, "rb"))
    
    logging.info("Successfully loaded QNN.")
    logging.debug("The QNN has the following contents: " + str(qnn))
    
    return qnn

qnn = load_qnn(qnn_path)

# We are interested in timing the time it takes from AFTER the images are loaded, to the inferences are all done.

logging.info("Starting the timer to time QBART.")
qbart_start = time()

# TODO: Add distribution of images if Beowulf cluster 
# is to be implemented (utilizing several PYNQs to exploit some DLP (working on entire images)).
# The work to be done should be load balanced between all PYNQs.
#
# Some quantitiative measuring should be done on the implementation, to see if image transfer overhead will
# overweigh the DLP benefits.

#def beowulf_distribute():
    # TODO: Define something here.
    

# In our MVP implementation, we do not support inferencing several images at once on the same FPGA,
# so inference is simply a matter of going through all the given images, one at a time. Layer for layer,
# saving the final classification for each image. 
#
# Basically the predict function from "layers.py", but will be heavily modified.

qbart_classifications = []

# Hard lesson learned the hard way volume 1:
# **ALWAYS** check that the input image format (channels and dimensions) matches what the QNN expects,
# elsewise it won't be able to run properly. It will also most likely give you very incorrect results if it actually runs.
# TODO: Add image format checking (all the parameters you need should exist in the provided QNN) 
# TODO: Do something less hacky than a counter i.
i = 0
for image in images:
    activations = image
    
    for layer in qnn:
        # Each layer will either do calculations on the A9 or the FPGA.
        # Everything that the FPGA is unable to do, simply runs on the CPU.
        #
        # It will initially look very similar to alot in the provided "layers.py", but should in the end be entirely
        # different when FPGA implements are finished.
        #
        # If there is no FPGA-implement, we simply reuse the provided sw-implement code by calling the "execute"
        # method for each layer object.
        
        # CONVOLUTION LAYER
        if (layer.layerType() == "QNNConvolutionLayer"):
            activations = layer.execute(activations)
        
        # FULLY CONNECTED LAYER
        elif (layer.layerType() == "QNNFullyConnectedLayer"):
            activations = layer.execute(activations)
        
        # POOLING LAYER
        elif (layer.layerType() == "QNNPoolingLayer"):
            activations = layer.execute(activations)
        
        # THRESHOLDING LAYER
        elif (layer.layerType() == "QNNThresholdingLayer"):
            activations = layer.execute(activations)
        
        # SCALESHIFT LAYER
        elif (layer.layerType() == "QNNScaleShiftLayer"):
            activations = layer.execute(activations)
        
        # PADDING LAYER
        elif (layer.layerType() == "QNNPaddingLayer"):
            activations = layer.execute(activations)
        
        # SLIDING WINDOW LAYER
        elif (layer.layerType() == "QNNSlidingWindowLayer"):
            activations = layer.execute(activations)
        
        # LINEAR LAYER
        elif (layer.layerType() == "QNNLinearLayer"):
            activations = layer.execute(activations)
        
        # SOFTMAX LAYER
        elif (layer.layerType() == "QNNSoftmaxLayer"):
            activations = layer.execute(activations)
        
        # ReLU LAYER
        elif (layer.layerType() == "QNNReLULayer"):
            activations = layer.execute(activations)
        
        # BIPOLAR THRESHOLDING LAYER (Can't this be replaced by the general thresholding layer?)
        elif (layer.layerType() == "QNNBipolarThresholdingLayer"):
            activations = layer.execute(activations)
        
        else:
            # Raise error, we are asked to perform a layer operation we do not know.
            raise ValueError("Invalid layer type.")
        
    qbart_classifications.append(np.argmax(activations))
    logging.info("Finished classifying image " + str(i) + " of " + str(len(images)))
    
    # This magic variable simply keeps track of which image we are currently processing.
    i+=1
qbart_end = time()
logging.info("Timed stoptime, QBART is finished with all classifications!")


logging.info("QBART used a total of " + str(qbart_end - qbart_start) + "seconds to classify these " + str(len(images)) + " images.")

# Remember that we are just executing a QNN, so mispredictions is no indicator of failure/success.
# Classifications are only for us so one can test that qbart actually runs properly.
for i in range(len(qbart_classifications)):
    logging.info("Image " + str(i) + " was classified as " + str(gtsrb_classes[qbart_classifications[i]]))

INFO:root:This folder currently has the following images: ['04077.ppm', '07070.ppm', '08181.ppm', '03660.ppm', '00510.ppm', '11646.ppm', '12423.ppm', '02890.ppm', '01103.ppm', '09052.ppm', '12217.ppm', '00190.ppm', '00908.ppm', '03376.ppm', '11536.ppm', '10178.ppm', '02397.ppm', '06938.ppm', '00218.ppm', '06518.ppm', '08423.ppm', '06187.ppm', '04442.ppm', '08343.ppm', '08720.ppm', '12228.ppm', '12511.ppm', '01117.ppm', '10607.ppm', '05873.ppm', '00862.ppm', '05176.ppm', '03484.ppm', '12208.ppm', '01060.ppm', '07839.ppm', '05099.ppm', '06594.ppm', '10783.ppm', '11656.ppm', '12225.ppm', '09884.ppm', '02154.ppm', '09281.ppm', '08669.ppm', '12402.ppm', '07597.ppm', '01848.ppm', '08825.ppm', '08404.ppm', '04330.ppm', '11252.ppm', '08354.ppm', '07763.ppm', '07147.ppm', '08817.ppm', '06371.ppm', '08457.ppm', '04479.ppm', '07296.ppm', '05044.ppm', '12272.ppm', '05673.ppm', '10283.ppm', '09557.ppm', '10866.ppm', '08563.ppm', '09869.ppm', '00835.ppm', '09175.ppm', '04491.ppm', '08535.ppm', '0874

Y u no print logger


INFO:root:We are now finished with loading images.
We are now finished with loading images.
2017-10-16 11:01:27,923 [MainThread  ] [INFO ]  We are now finished with loading images.
2017-10-16 11:01:27,923 [MainThread  ] [INFO ]  We are now finished with loading images.
2017-10-16 11:01:27,923 [MainThread  ] [INFO ]  We are now finished with loading images.
INFO:root:Currently trying to load provided QNN.
Currently trying to load provided QNN.
2017-10-16 11:01:27,990 [MainThread  ] [INFO ]  Currently trying to load provided QNN.
2017-10-16 11:01:27,990 [MainThread  ] [INFO ]  Currently trying to load provided QNN.
2017-10-16 11:01:27,990 [MainThread  ] [INFO ]  Currently trying to load provided QNN.
INFO:root:Successfully loaded QNN.
Successfully loaded QNN.
2017-10-16 11:01:28,081 [MainThread  ] [INFO ]  Successfully loaded QNN.
2017-10-16 11:01:28,081 [MainThread  ] [INFO ]  Successfully loaded QNN.
2017-10-16 11:01:28,081 [MainThread  ] [INFO ]  Successfully loaded QNN.
INFO:root:Sta

# Step 2: Running all image classifications on a CPU implementation 

## 2.1 Using the code from qnn-inference-examples (GTSRB only)
With some mods and assumptions in order to process alot of images instead of just one.
This also required some modding of the providedGTSRB_predict.

In [None]:
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import display

# Here we assume that the images are in channel, row, column layout, in BGR color.
tutorial_classifications = []

tutorial_start = time()
for image in images:
    tutorial_classifications.append(gtsrb_predict(image))
    
tutorial_stop = time()

tutorial_time_total = tutorial_stop - tutorial_start

## 2.2 Using Yaman's GEMMBITserial implementation (much faster?)

In [None]:
# TODO: Implement this, and time it. - https://github.com/maltanar/gemmbitserial
# C++ library, which means we have to do the following:
# Mod the most relevant layers that use matrix multiplication (hint hint convolution and fc hinthint)
# ??
# Great success

# Haven't done this before, but several suggestions can be found in: http://intermediate-and-advanced-software-carpentry.readthedocs.io/en/latest/c++-wrapping.html
# 



# Step 3: Implementation correctness testing

In [None]:
# TODO: Take the inferences from qbart, and the inferences from tutorial code. If the lists are equal, we are done.
# If not, there are potential bugs or wrong assumptions somewhere.

# Step 4: Presentation of results

In [None]:
# TODO: Present results. The most important one here is just presenting the various times.