# QBART: Minimal Viable Product Edition

*Welcome to QBART, the Quantized, Bitserial, AcceleRaThor!*

<img src="logo.png",width=400,height=400>

In this MVP-implementation, the QBART-team have prepared the following:
- Two layers run on the FPGA (the actual accelerator): thresholding and fully connected. A padding unit is also available for the several layers.
- All the other layers run on the Cortex A9s: pooling, convolution, sliding window.
- We utilize little to no BRAM on the FPGA, as most IO is saved directly to DRAM, and we have no custom memory hierarchy for the FPGA, so memory performance is suboptimal.
- We use the GTSRB-benchmark as the default in testing.

All in all, it might not accelerate anything at all, so the MVP is more a proof of concept, while future iterations will actually make this faster than the actual implementation. 

Alright, let's get to it!

## Requirements:
- A trained QNN that is pickled and formatted similarly as the GTSRB benchmark.
- This must be placed on the PYNQ, and you must edit the QNN path below so that QBART can find and work on it.
- Image(s) must also be placed in a seperate folders, and you must set the image path accordingly.

Alright, with the requirements done, we do the following:
1. Run all image classifications on QBART, and time it.
2. Run all image classifications on a pure, correct CPU implementation, and time it.
3. Check if both QBART and the CPU implementation agree. If both implementations agree on all image classifications, we know that the QBART implementation is correct.
4. Present the results to the user.



TODO(N35N0M): Implement Yaman's GEMMBitserial as a separate CPU-based implementation. Does the common-case (matrix multiplication) very fast. We like fast.

TODO(N35N0M): Currently, it seems like at every run of the code, we create an additonal root logger. Fix the code so that it either checks for an existing logger at a new runthrough, or that we safely destruct all loggers before running again.

TODO: We should specify how the input QNNs should be constructed, ask Yaman?
Is nice to have if we actually want others to be able to use QBART later.

TODO: Fix logger so that it actually works. Should be a helper file in qbart_helper.

# Step 1: Running all image classifications on QBART

In [2]:
"""
CREDIT WHERE CREDIT IS DUE
Parts of the code is simply reuse of the tutorial code provided by the course instructors.
Should be available at: https://github.com/maltanar/qnn-inference-examples.

Some python specific problems have been solved with stackoverflow help, 
and this is clearly credited in relevant code sections.
"""

# Open source libraries
import time

# Custom functions for the project
from qbart_helper import *
from QNN import *
from client import classification_client

###########################################################################################################
### USER INPUT SECTION, USER MUST SUBMIT VALUES OR "None" WHERE APPLICABLE
###########################################################################################################


qnn_path = "gtsrb-w1a1.pickle"         # Image directory, relative to where the notebook resides.
image_dir = "Images"                   # Image directory, relative to where the notebook resides.
image_limit = 100                    # Max amount of images to be inferenced, set None to inference all.
image_channels = "RGB"                 # Must be specified in order. 'R', 'G' and 'B' combinations only.
image_data_layout = "rcC"              # Must be specified, r = row, c = column, C = Channel

qbart_data_layout = "Crc"              # Qbart assumes data to be in column major form.

qnn_trained_channels = "BGR"           # The channel ordering that the qnn is trained to.
qnn_trained_imsize_col = 32            # The expected column size of input images to the qnn.
qnn_trained_imsize_row = 32            # The expected row size of input images to the qnn.

# Cluster config
aase = '192.168.1.7'
bjorg = '192.168.1.4'
gunn = '192.168.1.2'
solfrid = '192.168.1.5'
server_list = [('localhost', 64646)] # localhost is a minimum, or the program wont run.

# Either specify image classes to get an easily readable name, or specify None to just get a category #.
image_classes = ['20 Km/h', '30 Km/h', '50 Km/h', '60 Km/h', '70 Km/h', '80 Km/h', 'End 80 Km/h', '100 Km/h', '120 Km/h', 'No overtaking', 'No overtaking for large trucks', 'Priority crossroad', 'Priority road', 'Give way', 'Stop', 'No vehicles', 'Prohibited for vehicles with a permitted gross weight over 3.5t including their trailers, and for tractors except passenger cars and buses', 'No entry for vehicular traffic', 'Danger Ahead', 'Bend to left', 'Bend to right', 'Double bend (first to left)', 'Uneven road', 'Road slippery when wet or dirty', 'Road narrows (right)', 'Road works', 'Traffic signals', 'Pedestrians in road ahead', 'Children crossing ahead', 'Bicycles prohibited', 'Risk of snow or ice', 'Wild animals', 'End of all speed and overtaking restrictions', 'Turn right ahead', 'Turn left ahead', 'Ahead only', 'Ahead or right only', 'Ahead or left only', 'Pass by on right', 'Pass by on left', 'Roundabout', 'End of no-overtaking zone', 'End of no-overtaking zone for vehicles with a permitted gross weight over 3.5t including their trailers, and for tractors except passenger cars and buses']

###########################################################################################################
###########################################################################################################

###########################################################################################################
### MAIN METHOD, SHOULD BE KEPT RELATIVELY SIMPLE, DETAILS STORED AWAY IN HELPER FUNCTIONS
###########################################################################################################

images = load_images(image_dir, image_limit, qnn_trained_imsize_col, qnn_trained_imsize_row, qbart_data_layout, qnn_trained_channels)
qnn = load_qnn(qnn_path)


starttime = time.time()
# We send the images to the processing server (currently localhost, can later be localhost and others (each with
# its separate thread here in main or in classification client.))
qbart_classifications = classification_client(qnn, images, server_list)
qbart_classifications = [j for i in qbart_classifications for j in i]
endtime = time.time()

print("Time used for classification: " + str(endtime-starttime))

# Remember that we are just executing a QNN, so mispredictions is no indicator of failure/success.
# Classifications are only for us so one can test that qbart actually runs properly.
# This means that qbart_classification alone tells us very little, we need one or several correct cpu-implementations
# to compare to.
for i in range(len(qbart_classifications)):
    print(qbart_classifications[i][0], image_classes[qbart_classifications[i][1]])
    
###########################################################################################################
###########################################################################################################

603431
00000000000010010011010100100111
749727
('Size of image list that is now being sent:', '00000000000010110111000010011111')
The image list has been sent
Time used for classification: 4.31944799423
('Images/01818', 'Give way')
('Images/02227', '120 Km/h')
('Images/05562', 'Bicycles prohibited')
('Images/05553', 'No overtaking')
('Images/00327', 'Pass by on right')
('Images/08638', '30 Km/h')
('Images/09626', 'Priority road')
('Images/01618', 'Roundabout')
('Images/03199', '80 Km/h')
('Images/04071', '30 Km/h')
('Images/05579', '50 Km/h')
('Images/05512', 'Give way')
('Images/10834', 'Bicycles prohibited')
('Images/05743', 'Wild animals')
('Images/09125', 'No entry for vehicular traffic')
('Images/00359', 'No overtaking for large trucks')
('Images/04651', 'No overtaking for large trucks')
('Images/09931', '120 Km/h')
('Images/03881', 'No entry for vehicular traffic')
('Images/10678', 'Give way')
('Images/05348', '70 Km/h')
('Images/01979', '60 Km/h')
('Images/02059', 'No overtaking

# Step 2: Running all image classifications on a CPU implementation 

## 2.1 Using the code from qnn-inference-examples (GTSRB only)
With some mods and assumptions in order to process alot of images instead of just one.
This also required some modding of the providedGTSRB_predict.

In [None]:
import cPickle as pickle
from PIL import Image
import numpy as np
from QNN import *
from time import time
from QNN.layers import *
from qbart_helper import *

gtsrb_classes = ['20 Km/h', '30 Km/h', '50 Km/h', '60 Km/h', '70 Km/h', '80 Km/h', 'End 80 Km/h', '100 Km/h', '120 Km/h', 'No overtaking', 'No overtaking for large trucks', 'Priority crossroad', 'Priority road', 'Give way', 'Stop', 'No vehicles', 'Prohibited for vehicles with a permitted gross weight over 3.5t including their trailers, and for tractors except passenger cars and buses', 'No entry for vehicular traffic', 'Danger Ahead', 'Bend to left', 'Bend to right', 'Double bend (first to left)', 'Uneven road', 'Road slippery when wet or dirty', 'Road narrows (right)', 'Road works', 'Traffic signals', 'Pedestrians in road ahead', 'Children crossing ahead', 'Bicycles prohibited', 'Risk of snow or ice', 'Wild animals', 'End of all speed and overtaking restrictions', 'Turn right ahead', 'Turn left ahead', 'Ahead only', 'Ahead or right only', 'Ahead or left only', 'Pass by on right', 'Pass by on left', 'Roundabout', 'End of no-overtaking zone', 'End of no-overtaking zone for vehicles with a permitted gross weight over 3.5t including their trailers, and for tractors except passenger cars and buses']



# Here we assume that the images are in channel, row, column layout, in BGR color.
#tutorial_classifications = []

#tutorial_start = time()
#for image in images:
#    tutorial_classifications.append(GTSRB_predict(image))
    
#tutorial_stop = time()

#tutorial_time_total = tutorial_stop - tutorial_start

# Tutorial code galore.
def prepare_gtsrb(img):
    # make sure the image is the size expected by the network
    img = img.resize((32, 32))
    display(img)
    # convert to numpy array
    img = np.asarray(img)
    # we need the data layout to be (channels, rows, columns)
    # but it comes in (rows, columns, channels) format, so we
    # need to transpose the axes:
    img = img.transpose((2, 0, 1))
    # finally, our network is trained with BGR instead of RGB images,
    # so we need to invert the order of channels in the channel axis:
    img = img[::-1, :, :]
    return img

# load test images and prepare them

qnn = pickle.loads(load_qnn("gtsrb-w1a1.pickle"))


def gtsrb_predict(img):
    # get the predictions array
    res = predict(qnn, img)
    # return the index of the largest prediction, then use the
    # classes array to map to a human-readable string
    winner_ind = np.argmax(res)
    winner_class = gtsrb_classes[winner_ind]
    # the sum of the output values add up to 1 due to softmax,
    # so we can interpret them as probabilities
    return winner_class

qnn_classifications = []
qnn_classifications.append(gtsrb_predict(images[0]))
qnn_classifications.append(gtsrb_predict(images[1]))
qnn_classifications.append(gtsrb_predict(images[2]))
qnn_classifications.append(gtsrb_predict(images[3]))

print (qnn_classifications)

## 2.2 Using Yaman's GEMMBITserial implementation (much faster?)

In [None]:
# TODO(N35N0M): Implement this, and time it. - https://github.com/maltanar/gemmbitserial
# C++ library, which means we have to do the following:
# Mod the most relevant layers that use matrix multiplication (hint hint convolution and fc hinthint)
# ??
# Great success

# Haven't done this before, but several suggestions can be found in: http://intermediate-and-advanced-software-carpentry.readthedocs.io/en/latest/c++-wrapping.html
# 



# Step 3: Implementation correctness testing

In [None]:
print(qbart_classifications)
print(qnn_classifications)

if qnn_classifications == qbart_classifications:
    print("Holy Glomgold! It works!")
    # TODO: Present time used here. Perhaps energy later as well?
else:
    print("Uh-oh, something must'ave gone wrong somewhere!")

# Step 4: Presentation of results

In [None]:
# TODO: Present results. The most important one here is just presenting the various times.
# Not important for MVP rly, but if there is time (HA!), please fix.