# Convolutional Neural Network Object Detection
In this notebook, state-of-the-art models for object detection are used on some example images. First, Faster R-CNN will be covered. Subsequently, YOLO will be used for object detection.

## Faster R-CNN

### Import Libraries

In [None]:
import torch
import torchvision
import torchvision.transforms as transforms
import argparse
import cv2
import numpy as np
import matplotlib.pyplot as plt

from coco_names import COCO_INSTANCE_CATEGORY_NAMES as coco_names # the coco_names python script contains the classes of the objects
from PIL import Image


### Faster R-CNN
The pre-trained Faster R-CNN model will be loaded here. The model has a ResNet50 base network and will be loaded from the torchvision module. The min_size argument denotes the minimum dimensions of the bounding boxes that surround the objects. Making this value smaller will result in more small object to be detected.  

In [None]:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True,min_size=800)

### Predicting the Object Classes
A function is written here to detect objects and predict its classes and bounding boxes with the pre-trained Faster R-CNN. 

The torchvision model takes as input an image in the form of a tensor with dimensions [batch_size x channels x height x width]. Therefore the image needs to be transformed to a tensor. 

In [None]:
transform = transforms.ToTensor()

In [None]:
def predict(image,model,detection_threshold):
    image = transform(image)
    image = image.unsqueeze(0) # adding a batch dimension because we only work with single images
    outputs = model(image) 

    print(f"Boxes: {outputs[0]['boxes']}")
    print(f"Labels: {outputs[0]['labels']}")
    print(f"Scores: {outputs[0]['scores']}")

    # get all the predicted class labels
    pred_classes = [coco_names[i] for i in outputs[0]['labels'].numpy()]

    # get all the scores for the predicted objects
    pred_scores = outputs[0]['scores'].detach().numpy()

    # get all the predicted bounding boxes
    pred_bboxes = outputs[0]['boxes'].detach().numpy()

    # if the score is above the pre-defined threshold, then the bounding box is considered
    boxes = pred_bboxes[pred_scores >= detection_threshold].astype(np.int32)

    return boxes, pred_classes, outputs[0]['labels']

### Drawing the Bounding Box
A function is written here to draw the bounding boxes around the detected objects in the image.  

In the image there can be many objects of different classes. Therefore, bounding boxes of similar classes need to have the same colour. This helps in visualising of the detections. 

In [None]:
COLORS = np.random.uniform(0,255,size=(len(coco_names),3)) # 3 dimensional vectors for each label is created, the values range between 0 and 255
COLORS

In [None]:
def draw_boxes(boxes,classes,labels,image):
    # read the image with OpenCV
    image = cv2.cvtColor(np.asarray(image), cv2.COLOR_BGR2RGB)

    for i, box in enumerate(boxes):
        color = COLORS[labels[i]]
        start_x = int(box[0])
        start_y = int(box[1])
        end_x = int(box[2])
        end_y = int(box[3])
        cv2.rectangle(image,(start_x,start_y),(end_x,end_y),color,2) # rectangle needs the image, the starting box coordinates, the ending box coordinates, the color and the line thickness as input
        cv2.putText(image,classes[i],(start_x,start_y-5), cv2.FONT_HERSHEY_SIMPLEX,0.8,color,2) # putText needs the image, the starting coordinates of the text, the font, the size, the text color and the letter thickness as input

    return image

### Results

The Faster R-CNN model will be now used with the above defined functions to perform object detection on 3 image examples.

The Faster R-CNN model detects all the objects within the image. A bounding box is placed around the objects. These are annotated with only the class name.

In [None]:
images_ex = ['horses.jpg','people.jpg','street.jpg']

In [None]:
model.eval()

Image example 1:

In [None]:
image = Image.open(images_ex[0])
boxes, classes, labels = predict(image,model,0.8)
image = draw_boxes(boxes,classes,labels,image)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(image)

Image example 2:

In [None]:
image = Image.open(images_ex[1])
boxes, classes, labels = predict(image,model,0.8)
image = draw_boxes(boxes,classes,labels,image)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(image)

Image example 3:

In [None]:
image = Image.open(images_ex[2])
boxes, classes, labels = predict(image,model,0.8)
image = draw_boxes(boxes,classes,labels,image)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(image)

## YOLO

### Import Libraries

In [None]:
import torch
import cv2
from PIL import Image

### YOLO v5
The pre-trained YOLO v5 model will be loaded here from ultralytics. But first install the model requirements in your virtual environment: pip install -r https://raw.githubusercontent.com/ultralytics/yolov5/master/requirements.txt

In [None]:
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

### Images
Some images from ultralytics are loaded here to perform object detection on. 

In [None]:
examples = ['zidane.jpg', 'bus.jpg', ]
for f in examples:
    torch.hub.download_url_to_file('https://ultralytics.com/images/' + f, f) 
im1 = Image.open('zidane.jpg')  # PIL image
im2 = cv2.imread('bus.jpg')[..., ::-1]  # OpenCV image (BGR to RGB)

### Results
The YOLO v5 model is now used to perform object detection on the 2 downloaded image examples. Batch interference is used. 

The YOLO v5 model detects all the objects within the image. A bounding box is placed around the objects. These are annotated with the class name and probability.

In [None]:
results = model([im1, im2], size=640) # size = batch of images
results.print()  
results.show()