# PYNQ_301: OBJECT DETECTION
----

The aim of this notebook is to:
- Understand what is YOLOv3 and VOC.
- Learn how to use the YOLOv3 model to detect objects in an image and a webcam feed.
- Learn how to display information on an OLED.

![image.png](attachment:image.png)

## What is YOLO?
----

![image-2.png](attachment:image-2.png)

You only look once (YOLO) is a Convolutional Neural Network (CNN) for performing object detection in real-time. CNNs are classifier-based systems that can process input images as structured arrays of data and recognize patterns between them.<br>

## What is VOC?
----
VOC is a dataset which contains data from the PASCAL Visual Object Classes Challenge.
A total of 11540 images are included in this dataset, where each image contains a set of objects, out of 20 different classes, making a total of 27450 annotated objects.<br>
![image-3.png](attachment:image-3.png)<br>
![image-4.png](attachment:image-4.png)

# Hardware Setup
----
### 1. KRIA KV260 Board<br> ![image-2.png](attachment:image-2.png)
### 2. PYNQ Grove Adapter<br> ![image.png](attachment:image.png)
### 3. OLED Display<br> ![image-3.png](attachment:image-3.png)
### 4. Webcam<br>![image-5.png](attachment:image-5.png)

# Let's get started 
----

## 1. Prepare the overlay
We will download the overlay onto the board. 

In [None]:
from pynq_dpu import DpuOverlay
from pynq_peripherals import PmodGroveAdapter
overlay = DpuOverlay("dpu.bit")

In [None]:
# Initiate pmod adapter. G3 indicates that the OLED is connected to the G3 slot on the PYNQ Grove Adapter
adapter = PmodGroveAdapter(overlay.PMODA, G3='grove_oled')

## 2. Utility functions

In this section, we will prepare a few functions for later use.

In [None]:
# Import necessary libraries
import os
import time
import numpy as np
import cv2
import random
import colorsys
from matplotlib.patches import Rectangle
import matplotlib.pyplot as plt
from IPython.display import display, Image
%matplotlib inline

Now we'll load the yolov3 xmodel which is trained on the voc dataset.

In [None]:
overlay.load_model("tf_yolov3_voc.xmodel")

The YOLOv3 model predicts offsets from a predetermined set of boxes with particular height-width ratios; those predetermined set of boxes are the anchor boxes. Let's define those..

In [None]:
anchor_list = [10,13,16,30,33,23,30,61,62,45,59,119,116,90,156,198,373,326]
anchor_float = [float(x) for x in anchor_list]
anchors = np.array(anchor_float).reshape(-1, 2)

The VOC dataset consists of 20 classes which can be detected. The voc_classes.txt has the list of classes. 

![image.png](attachment:image.png)<br>

In [None]:
# Get Model Classification Information
def get_class(classes_path):
    with open(classes_path) as f:
        class_names = f.readlines()
    class_names = [c.strip() for c in class_names]
    return class_names
    
classes_path = "img/voc_classes.txt"
class_names = get_class(classes_path)

### To understand the output of the model in an easier manner we would want to draw a bounding box around the object which is detected and display a score which represents the probability of thhe detected object belonging to a specific class.
![image.png](attachment:image.png)
### We can associate each class with a specific color. The cell below does this for us.

In [None]:
# Define unique colors for each class
num_classes = len(class_names)
hsv_tuples = [(1.0 * x / num_classes, 1., 1.) for x in range(num_classes)]
colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
colors = list(map(lambda x: 
                  (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), 
                  colors))
random.seed(0)
random.shuffle(colors)
random.seed(None)


### Let's define some useful functions which will be used later in the notebook.

In [None]:
# This function resizes the image with unchanged aspect ratio using padding.
def letterbox_image(image, size):
    ih, iw, _ = image.shape
    w, h = size
    scale = min(w/iw, h/ih)
    
    nw = int(iw*scale)
    nh = int(ih*scale)

    image = cv2.resize(image, (nw,nh), interpolation=cv2.INTER_LINEAR)
    new_image = np.ones((h,w,3), np.uint8) * 128
    h_start = (h-nh)//2
    w_start = (w-nw)//2
    new_image[h_start:h_start+nh, w_start:w_start+nw, :] = image
    return new_image

# This function performs pre-processing by helping us in converting the image into an array which can be fed for processing.
def pre_process(image, model_image_size):
    image = image[...,::-1]
    image_h, image_w, _ = image.shape
 
    if model_image_size != (None, None):
        assert model_image_size[0]%32 == 0, 'Multiples of 32 required'
        assert model_image_size[1]%32 == 0, 'Multiples of 32 required'
        boxed_image = letterbox_image(image, tuple(reversed(model_image_size)))
    else:
        new_image_size = (image_w - (image_w % 32), image_h - (image_h % 32))
        boxed_image = letterbox_image(image, new_image_size)
    image_data = np.array(boxed_image, dtype='float32')
    image_data /= 255.
    image_data = np.expand_dims(image_data, 0) 
    return image_data

# This function gets information on box position and its size along with confidence and box class probabilities
def _get_feats(feats, anchors, num_classes, input_shape):
    num_anchors = len(anchors)
    anchors_tensor = np.reshape(np.array(anchors, dtype=np.float32), [1, 1, 1, num_anchors, 2])
    grid_size = np.shape(feats)[1:3]
    nu = num_classes + 5
    predictions = np.reshape(feats, [-1, grid_size[0], grid_size[1], num_anchors, nu])
    grid_y = np.tile(np.reshape(np.arange(grid_size[0]), [-1, 1, 1, 1]), [1, grid_size[1], 1, 1])
    grid_x = np.tile(np.reshape(np.arange(grid_size[1]), [1, -1, 1, 1]), [grid_size[0], 1, 1, 1])
    grid = np.concatenate([grid_x, grid_y], axis = -1)
    grid = np.array(grid, dtype=np.float32)

    box_xy = (1/(1+np.exp(-predictions[..., :2])) + grid) / np.array(grid_size[::-1], dtype=np.float32)
    box_wh = np.exp(predictions[..., 2:4]) * anchors_tensor / np.array(input_shape[::-1], dtype=np.float32)
    box_confidence = 1/(1+np.exp(-predictions[..., 4:5]))
    box_class_probs = 1/(1+np.exp(-predictions[..., 5:]))
    return box_xy, box_wh, box_confidence, box_class_probs


# This function is used to correct the bounding box position post processing 
def correct_boxes(box_xy, box_wh, input_shape, image_shape):
    box_yx = box_xy[..., ::-1]
    box_hw = box_wh[..., ::-1]
    input_shape = np.array(input_shape, dtype = np.float32)
    image_shape = np.array(image_shape, dtype = np.float32)
    new_shape = np.around(image_shape * np.min(input_shape / image_shape))
    offset = (input_shape - new_shape) / 2. / input_shape
    scale = input_shape / new_shape
    box_yx = (box_yx - offset) * scale
    box_hw *= scale

    box_mins = box_yx - (box_hw / 2.)
    box_maxes = box_yx + (box_hw / 2.)
    boxes = np.concatenate([
        box_mins[..., 0:1],
        box_mins[..., 1:2],
        box_maxes[..., 0:1],
        box_maxes[..., 1:2]
    ], axis = -1)
    boxes *= np.concatenate([image_shape, image_shape], axis = -1)
    return boxes

# This function is used to get information on the valid objects detected and their scores
def boxes_and_scores(feats, anchors, classes_num, input_shape, image_shape):
    box_xy, box_wh, box_confidence, box_class_probs = _get_feats(feats, anchors, classes_num, input_shape)
    boxes = correct_boxes(box_xy, box_wh, input_shape, image_shape)
    boxes = np.reshape(boxes, [-1, 4])
    box_scores = box_confidence * box_class_probs
    box_scores = np.reshape(box_scores, [-1, classes_num])
    return boxes, box_scores


# This function is used to draw a bouding box in the image.
def draw_bbox(image, bboxes, classes):
    """
    bboxes: [x_min, y_min, x_max, y_max, probability, cls_id] format coordinates.
    """
    num_classes = len(classes)
    image_h, image_w, _ = image.shape
    hsv_tuples = [(1.0 * x / num_classes, 1., 1.) for x in range(num_classes)]
    colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
    colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors))

    random.seed(0)
    random.shuffle(colors)
    random.seed(None)

    for i, bbox in enumerate(bboxes):
        coor = np.array(bbox[:4], dtype=np.int32)
        fontScale = 0.5
        score = bbox[4]
        class_ind = int(bbox[5])
        bbox_color = colors[class_ind]
        bbox_thick = int(0.6 * (image_h + image_w) / 600)
        c1, c2 = (coor[0], coor[1]), (coor[2], coor[3])
        cv2.rectangle(image, c1, c2, bbox_color, bbox_thick)
    return image

# This function suppresses non-maximal boxes
def nms_boxes(boxes, scores):
    """Suppress non-maximal boxes.

    # Arguments
        boxes: ndarray, boxes of objects.
        scores: ndarray, scores of objects.

    # Returns
        keep: ndarray, index of effective boxes.
    """
    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]

    areas = (x2-x1+1)*(y2-y1+1)
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)

        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])

        w1 = np.maximum(0.0, xx2 - xx1 + 1)
        h1 = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w1 * h1

        ovr = inter / (areas[i] + areas[order[1:]] - inter)
        inds = np.where(ovr <= 0.55)[0]  # threshold
        order = order[inds + 1]

    return keep

# This function is used to draw boxes.
def draw_boxes(image, boxes, scores, classes):
    _, ax = plt.subplots(1)
    ax.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
    image_h, image_w, _ = image.shape

    for i, bbox in enumerate(boxes):
        [top, left, bottom, right] = bbox
        width, height = right - left, bottom - top
        center_x, center_y = left + width*0.5, top + height*0.5
        score, class_index = scores[i], classes[i]
        label = '{}: {:.4f}'.format(class_names[class_index], score) 
        color = tuple([color/255 for color in colors[class_index]])
        ax.add_patch(Rectangle((left, top), width, height,
                               edgecolor=color, facecolor='none'))
        ax.annotate(label, (center_x, center_y), color=color, weight='bold', 
                    fontsize=12, ha='center', va='center')
    return ax

# This function evaluates the scores and makes predictions based on provided class names
def evaluate(yolo_outputs, image_shape, class_names, anchors):
    score_thresh = 0.2
    anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
    boxes = []
    box_scores = []
    input_shape = np.shape(yolo_outputs[0])[1 : 3]
    input_shape = np.array(input_shape)*32

    for i in range(len(yolo_outputs)):
        _boxes, _box_scores = boxes_and_scores(
            yolo_outputs[i], anchors[anchor_mask[i]], len(class_names), 
            input_shape, image_shape)
        boxes.append(_boxes)
        box_scores.append(_box_scores)
    boxes = np.concatenate(boxes, axis = 0)
    box_scores = np.concatenate(box_scores, axis = 0)

    mask = box_scores >= score_thresh
    boxes_ = []
    scores_ = []
    classes_ = []
    for c in range(len(class_names)):
        class_boxes_np = boxes[mask[:, c]]
        class_box_scores_np = box_scores[:, c]
        class_box_scores_np = class_box_scores_np[mask[:, c]]
        nms_index_np = nms_boxes(class_boxes_np, class_box_scores_np) 
        class_boxes_np = class_boxes_np[nms_index_np]
        class_box_scores_np = class_box_scores_np[nms_index_np]
        classes_np = np.ones_like(class_box_scores_np, dtype = np.int32) * c
        boxes_.append(class_boxes_np)
        scores_.append(class_box_scores_np)
        classes_.append(classes_np)
    boxes_ = np.concatenate(boxes_, axis = 0)
    scores_ = np.concatenate(scores_, axis = 0)
    classes_ = np.concatenate(classes_, axis = 0)

    return boxes_, scores_, classes_

In [None]:
# Extract the total number of images ending with "JPEG" format present in the 'img' directory
image_folder = 'img'
original_images = [i for i in os.listdir(image_folder) if i.endswith("JPEG")]
total_images = len(original_images)


## 3. Object Detection in an image using a filesystem
The next few cells help us in processing data for our predictions.

In [None]:
# Setup input and output tensors
dpu = overlay.runner
inputTensors = dpu.get_input_tensors()
outputTensors = dpu.get_output_tensors()

shapeIn = tuple(inputTensors[0].dims)

shapeOut0 = (tuple(outputTensors[0].dims)) # (1, 13, 13, 75)
shapeOut1 = (tuple(outputTensors[1].dims)) # (1, 26, 26, 75)
shapeOut2 = (tuple(outputTensors[2].dims)) # (1, 52, 52, 75)

outputSize0 = int(outputTensors[0].get_data_size() / shapeIn[0]) # 12675
outputSize1 = int(outputTensors[1].get_data_size() / shapeIn[0]) # 50700
outputSize2 = int(outputTensors[2].get_data_size() / shapeIn[0]) # 202800

# Setup Buffers
input_data = [np.empty(shapeIn, dtype=np.float32, order="C")]
output_data = [np.empty(shapeOut0, dtype=np.float32, order="C"), 
               np.empty(shapeOut1, dtype=np.float32, order="C"),
               np.empty(shapeOut2, dtype=np.float32, order="C")]
image = input_data[0]

### The function defined below is the main function which performs pre-processing on a frame, make model predictions and decode the output.

In [None]:
# Function to perform pre-processing, model predictions and decoding output
def run(frame, display=False):
    
    # Pre-processing
    image_size = frame.shape[:2]
    image_data = np.array(pre_process(frame, (416, 416)), dtype=np.float32)
    
    # Fetch data to DPU and trigger it
    image[0,...] = image_data.reshape(shapeIn[1:])
    job_id = dpu.execute_async(input_data, output_data)
    dpu.wait(job_id)
    
    # Retrieve output data
    conv_out0 = np.reshape(output_data[0], shapeOut0)
    conv_out1 = np.reshape(output_data[1], shapeOut1)
    conv_out2 = np.reshape(output_data[2], shapeOut2)
    yolo_outputs = [conv_out0, conv_out1, conv_out2]
    
    # Decode output from YOLOv3
    boxes, scores, classes = evaluate(yolo_outputs, image_size, class_names, anchors)
    
    if display:
        _ = draw_boxes(frame, boxes, scores, classes)
        
    return boxes, scores, classes

In [None]:
# Read an input image in the "img" direectory
input_image = cv2.imread(os.path.join(image_folder, original_images[4]))

In [None]:
# Perform pre-processing, model predictions and decode the output from the image
run(input_image, display=True)

## 4. Object Detection by Using a Webcam

In [None]:
# Start capturing a video 
videoIn = cv2.VideoCapture(0)
videoIn.set(cv2.CAP_PROP_FRAME_WIDTH, 640);
videoIn.set(cv2.CAP_PROP_FRAME_HEIGHT, 480);

print("Capture device is open: " + str(videoIn.isOpened()))

In [None]:
# Extract the frame from the video feed
ret, frame = videoIn.read()

In [None]:
# Perform predictions on the frame
boxes, scores, classes = run(frame, display=True)

In [None]:
# Store information of the detected object which has the highest score. And store the class information of this object
best_score = np.argmax(scores)
class_names[classes[best_score]]

We will now print the object with the highest score on the OLED. 

In [None]:
oled = adapter.G3
oled.set_default_config()
oled.set_normal_display()
oled.put_string("Detected") 
oled.set_position(2, 0)
oled.put_string(f"{class_names[classes[best_score]]}")

## Example OLED Output for a monitor would look like:

![image.png](attachment:image.png)


## 5. Real Time Object Detection from webcam

In [None]:
videoIn.release()
cap = cv2.VideoCapture(0)

In [None]:
display_handle=display(None, display_id=True)

Keeping track of only best scoring object in the image...

In [None]:
last_object = "None"

# while True:
for i in range(200):
    _, frame = cap.read()
    boxes, scores, classes = run(frame)
    
    if scores.any():

        best_score = np.argmax(scores)

        # Draw bounding box
        y_min,x_min,y_max,x_max = map(int, boxes[best_score])
        frame = cv2.rectangle(frame, (x_min,y_min), (x_max, y_max), color=255)

        # Label
        text = f"{class_names[classes[best_score]]}: {scores[best_score]:.2f}"
        text_size = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2)[0]
        frame = cv2.putText(frame, text, (x_min, y_min-text_size[1]), cv2.FONT_HERSHEY_SIMPLEX, 1, 255, 1, cv2.LINE_AA)

        _, frame = cv2.imencode('.jpeg', frame)
        display_handle.update(Image(data=frame.tobytes()))
        
        if class_names[classes[best_score]] == last_object:
            pass
        else:
            oled.clear_display()
            oled.put_string(class_names[classes[best_score]])
            last_object = class_names[classes[best_score]]

# <h1><center>END OF NOTEBOOK</center></h1>



# CHALLENGES:
## Modify the code in section 4 and 5 above in incremental steps:
- Draw a bounding box only when a person is detected.
- Draw a bounding box when a person is detected with a score of 0.8 and above.

In [None]:
# Enter Code Here..


Let's clear the the overlay and the dpu specific data:

In [None]:
del overlay
del dpu