# Exercise 1: Object Detection

In this problem, we will explore some of the features that torchvision (a pytorch library) offers for visualizing images, performing object detection with pretrained CNN models, and extracting and plotting detected bounding boxes. We'll start by downloading a set of images that we'll use for detection.


In [None]:
import os
import torch
import numpy as np
import matplotlib.pyplot as plt
import torchvision.transforms as transforms
import torchvision.transforms.functional as F
from torchvision.utils import draw_bounding_boxes
from torchvision.io import read_image
from torchvision.models.detection import fcos_resnet50_fpn, FCOS_ResNet50_FPN_Weights
%matplotlib inline

imagefolder = os.path.join(os.getcwd(), 'images')

def show(img: torch.Tensor) -> None:
    """
    Plots a torch image Tensor.

    Args:
        img: (3, H, W) torch Tensor of an image.
    """
    img = img.detach()
    img = F.to_pil_image(img)
    plt.imshow(np.asarray(img))
    plt.show()

#### Exercise 1.1: Object detection with pretrained models

In the first part of this problem, we will perform object detection using a pretrained torchvision model. By *pretrained*, we mean that the weights of the model have already undergone training, and will not be changing over the course of this problem. We will be using a pretrained `fcos_resnet50_fpn()` model or a pretrained YOLOv5 model. 
Your task is to implement:
1. `fcos_resnet50`: evaluates an image using the model described [here](https://pytorch.org/vision/0.16/models/generated/torchvision.models.detection.fcos_resnet50_fpn.html#torchvision.models.detection.fcos_resnet50_fpn). See the documentation for details on how to load and evaluate the model.
2. `yolo_v5`: evaluates an image using the YOLOv5 model described [here](https://github.com/ultralytics/yolov5). See the documentation for details on how to load and evalute the model.

Both functions will take an image as input and output `boxes`, `scores`, and `labels` for each object detected in the image.

In [None]:
def fcos_resnet50(image: torch.Tensor):
    """
    Evaluate the fcos_resnet50_fpn model with the given image.

    Params:
        image: image to evaluate

    Returns:
        boxes: box detections
        scores: scores for each detection
        labels: labels for each detection
    """
    weights = FCOS_ResNet50_FPN_Weights.DEFAULT
    ########## Code starts here ##########
    # Load the pretrained FCOS model
    # Load preprocessing transforms from the weights
    # Load and preprocess an image
    # Run inference
    
    ########## Code ends here ############

    return boxes, scores, labels

In [None]:
def yolo_v5(image: torch.Tensor):
    """
    Evaluate the YOLOv5 model with the given image.

    Params:
        image: image to evaluate

    Returns:
        boxes: box detections
        scores: scores for each detection
        labels: labels for each detection
    """
    # YOLOv5 can be loaded directly from torch.hub
    # This will download the model if not already cached
    model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
    
    ########## Code starts here ##########
    # Hints:
    # 1. YOLOv5 models can be loaded using torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
    #    Available models: yolov5n, yolov5s, yolov5m, yolov5l, yolov5x (n=nano, s=small, m=medium, l=large, x=extra large)
    # 2. YOLOv5 models are in evaluation mode by default when loaded with pretrained=True
    # 3. YOLOv5 can handle single images directly (no need for manual batching)
    # 4. YOLOv5 handles normalization internally - you can pass PIL Images, numpy arrays, or file paths directly
    # 5. The model returns a Results object with detection information
    
    ########## Code ends here ############

    return boxes, scores, labels

#### Exercise 1.2: Visualize Object Detections
Implement the function `draw_result` which produces an image with an overlay of the bounding box detections, their labels, and their scores.

In [None]:
def draw_result(
    image: torch.Tensor,
    boxes: torch.Tensor,
    labels: list[str],
    scores: torch.Tensor,
) -> torch.Tensor:
    """
    Draw bounding box visualization over top of the raw image.

    Args:
        image: (3, H, W) torch Tensor of an image with values 0-255.
        boxes: (N, 4) torch Tensor of detected bounding boxes in xyxy format.
        labels: (N,) length list of string label names
        scores: (N,) torch Tensor of confidence scores.

    Returns:
        img_with_bbox: (3, H, W) torch Tensor of an image with bounding boxes drawn atop.
    """
    ########## Code starts here ##########
    # Hints:
    # 1. This function is small, the solution is only a couple of lines.
    # 2. See documentation for draw_bounding_boxes here:
    #    https://pytorch.org/vision/stable/generated/torchvision.utils.draw_bounding_boxes.html
    # 3. You need to create label strings that combine class names and confidence scores
    # 4. Format strings like f"{label}: {score:.2f}" work well for labels

    ########## Code ends here ##########

    return img_with_bbox

Run the code below with a selected image and model to evaluate.

In [None]:
# NOTE: Select an image on which to perform object detection.
imagepath = os.path.join(imagefolder, 'airport.jpeg')
# imagepath = os.path.join(imagefolder, 'cars.jpeg')
# imagepath = os.path.join(imagefolder, 'dog_park.jpeg')

# Load image
image = read_image(imagepath)
show(image)

# NOTE: Select a model to explore
boxes, scores, labels = fcos_resnet50(image)
# boxes, scores, labels = yolo_v5(image)

# Show overlayed image with detections
show(draw_result(image, boxes, labels, scores))

#### Exercise 1.3: Filter Low Confidence Detections
Implement the function `filter` which produces a filtered set of object detections based on the score.

In [None]:
def filter(boxes: torch.Tensor, labels: list[str], scores: torch.Tensor, threshold: float):
    """
    Filter the detections based on scores and a given threshold.

    Params:
        boxes: box detections
        labels: list of label strings
        scores: scores for each detection
        threshold: threshold for the score

    Returns:
        filtered boxes
        filtered labels
        filtered scores
    """
    ########## Code starts here ##########
    # Hints for threshold exploration:
    # 1. Can define a boolean mask by: mask = scores >= threshold
    # 2. Can directly mask torch.Tensor with x[mask]
    mask = scores >= threshold
    filtered_labels = [label for label, m in zip(labels, mask) if m]
    return boxes[mask], filtered_labels, scores[mask]
    ########## Code ends here ##########

Run the code below to show the filtered object detections.

In [None]:
show(draw_result(image, *filter(boxes, labels, scores, 0.5)))