### We will be using opencv to handle everything from loading network configuration and weights, to display the results.

In [1]:
import cv2 as cv
from copy import deepcopy
import numpy as np
import src.utility as ut

### File paths

During the workshop, we've simplyfied it such that you only have to provide the name of the file you want to work with. 

* Any **weight** files you want to use should be places in the **data/weights** folder in this project
* Any **configuration** files you want to use should be places in the **data/cfg** folder in this project
* Any **images** you want to use should be places in the **data/images** folder in this project

In [2]:
weigths_path = "data/weights/"
cfg_path = "data/cfg/"
class_path = "data/"

### Download necessary files

If the files are already downloaded and placed correctly, no downloads will start.

In [3]:
ut.download("https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg", cfg_path, "yolov3.cfg")
ut.download("https://pjreddie.com/media/files/yolov3.weights", weigths_path, "yolov3.weights")
ut.download("https://raw.githubusercontent.com/pjreddie/darknet/master/data/coco.names", class_path, "coco.names")

yolov3.cfg already exists at the provided destination!
yolov3.weights already exists at the provided destination!
coco.names already exists at the provided destination!


### Loading weights and configuration files, using OpenCV

Using provided names, the function ```load``` will create a network for us.

In [4]:
def load(weights, cfg):
    weights = weigths_path + weights
    cfg = cfg_path + cfg
    return cv.dnn.readNet(weights, cfg)

### Extract output layers

The last preperation we need to do for the network, is to extract the output layers. These will provide us with the detections which we will use to draw boxes on the image.

In [5]:
def extract_output_layers(net):
    names = net.getLayerNames()
    return [names[layer[0] - 1] for layer in net.getUnconnectedOutLayers()]

### Loading classes

Next, we want to load the classes we will use to label the output. These could techically be any list of 80 strings. It has to be atleast 80, as we will index this list from the 80 possible classes this network was trained on originally. 

In [6]:
def prepare_classes(classes):
    with(open(class_path + classes, "r")) as f:
        return [line.strip() for line in f.readlines()]

### Creating a blob from image

This method will expect a preloaded image (utility functions for this will be shown later) which will be converted to a blob. This is the expected format for YOLO.

A few things to note about this:
* **size** defines the dimensions of the final input we send to our network.
* **scalefactor** lets us scale each pixel in the image. As we have 255 possibilities per channel (RGB), we will scale the pixels by 1/255 which is part of the exected input for YOLO.
* **swapRB** simply means that we will swap R and B channels, as OpenCV for some reason works with BGR instead of RGB.

**Important note about size:**

If you decide to change this parameter (which you should try, by the way) later in the workshop, please note that YOLO downsamples the input by 32. This means that the size of the input much be some multiple of 32 in both the width and height

In [7]:
def image_to_blob(image, size=(416, 416)):
    return cv.dnn.blobFromImage(image, scalefactor=1/255, size=size, swapRB=True, crop=False)

### Forwarding the image

This is the main part. This will forward our blob through the network and return the output.

In [8]:
def forward(net, blob, output_layers):
    net.setInput(blob)
    return net.forward(output_layers)

### Drawing box detections

This is a utility function we use to draw boxes on a given image. We will use this during the postprocess.

In [9]:
def draw_box(image, label, box, box_color, text_color, font_scale, font, thickness):
    x,y,w,h = box
    tw, th = cv.getTextSize(text=label, fontFace=font, fontScale=font_scale, thickness=thickness)[0]
    cv.rectangle(image, (x, y), (x + w, y + h), box_color, thickness=thickness)
    cv.rectangle(image, (x, y), (x + tw + 10, y - th - 10), color=box_color, thickness=cv.FILLED)
    cv.putText(image, label, (x + 5, y - 5), fontFace=font, fontScale=font_scale, color=text_color, thickness=thickness)

### Process the output

Given an image, network outputs and the classes (labels) we decide to use - this function will handle things like overlapping boxes and sorting out detections under a certain threshold. 

You might be confused by the lines such as ```center_x = detection[0] * width```. Why would we have to multiply the output by the width of the image? This is because YOLO will use numbers between 0 and 1, which is scaled down by the width and height of the input, as its output. We simply multiply to translate back to values we can work with.

In [10]:
def postprocess(image, outputs, classes, threshold=0.8, nms_threshold=0.7, box_color=(0, 0, 0), text_color=(255, 255, 255), font_scale=0.5, font=cv.FONT_HERSHEY_SIMPLEX, thickness=2):
    height, width, _ = image.shape

    image = deepcopy(image)

    class_ids = []
    confidences = []
    boxes = []

    for o in outputs:
        for detection in o:
            center_x = detection[0] * width
            center_y = detection[1] * height
            w = detection[2] * width
            h = detection[3] * height
            x = center_x - w/2
            y = center_y - h/2

            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]

            boxes.append([int(x), int(y), int(w), int(h)])
            confidences.append(float(confidence))
            class_ids.append(class_id)

    indices = cv.dnn.NMSBoxes(boxes, confidences, threshold, nms_threshold)

    for i in indices:
        i = i[0]
        label = str(classes[class_ids[i]])
        draw_box(image, label, boxes[i], box_color, text_color, font_scale, font, thickness)
        
    return image, class_ids, confidences, boxes

### Example time!

Using the functions above, this next block will load a network, load an image and forward it through the network. Then we will utilize the postprocess function to create an image with the detections displayed on it. Finally we show this image.

In [11]:
weights = "yolov3.weights"
cfg = "yolov3.cfg"
classes = "coco.names"
image = "demo.jpg"

net = load(weights, cfg)

classes = prepare_classes(classes)

output_layers = extract_output_layers(net)

img = ut.load_image(image)

blob = image_to_blob(img)

outputs = forward(net, blob, output_layers)

processed_image,_,_,_ = postprocess(img, outputs, classes, 0.2)

ut.show_image(processed_image)

### What's next?

We have prepared two more notebooks. 
* **image_workshop** here you will get access to the same you did here, but in a more condensed package. This is where you should go to experiment with different parameters.
* **video_workshop** this workshop revolves around performing object detection on videos from youtube. 