In [3]:
import torch
import torchvision
from torchvision import datasets, models, transforms
import torch.nn as nn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader, Subset

**Classification and Localization:**

- Localizing an object in a picture can be expressed as a regression task. to predict a bounding box around the object, a common approach is to predict the horizonatal and vertical coordinates of the object's center, as well as its height and width. (4 numbers to predict). Doesn't require much change to the model - just need to add a second dense output layer with 4 units and can be trained using the MSE loss. 

In [4]:
feature_extractor = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)

In [6]:
class LocalizationModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.feature_extractor = nn.Sequential(*list(feature_extractor.children())[:-1])
        self.classifier = nn.Linear(512,1)
        self.localizer = nn.Linear(512,4)
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        batch = x.shape[0]
        x = self.feature_extractor(x)
        x = x.view(batch, -1)
        class_output = self.classifier(x)
        class_output = self.sigmoid(class_output)
        loc_output = self.localizer(x)
        return class_output, loc_output


In [7]:
model = LocalizationModel()

Problem:
- Our original dataset doesn't have bounding boxes around the images - so we need to add them ourselves. To annotate the images with bounding boxes - may want to use an opensource image labelling tool like VGG Image annotator, LabelImg, OpenLabeler, or ImgLab. or perphaps a commercial tool like LabelBox or Supervisely. Can also look into crowdsourcing platforms like Amazon Mechanical Turk if you want to have a very large number of images to annotate - <a href="https://arxiv.org/pdf/1611.02145">Paper Reference</a>. 

- Now suppose you;ve obtained the bounding boxes for every image in the dataset (assume a single bounding box per image for now). Now need to create a dataste whose items will be batches of preprecessed images along with their class labels and bounding boxes. Each item should be a tuple of form (imagesm (class_labels, bounding boxes))

- Bounding boxes should be normalized so that the horizontal and vertical coordinates, as well as the height and width all range from 0 to 1.

- The MSE often works fairly well but isn't  agreat metric to evaluate how well the model can predict bounding boxes. Most common metric is the Intersection over union (IOU): the area of overlap between the predicted bounding box and the target bounding box, divided by the area of their union

- <img src="../../data/report_images/iou.png" alt="IOU Image">

**Object Detection:**
- Task of classifying and localizing multiple objects in an image - object detection. 