# <font style="color:blue">Project 3: Object Detection</font>


#### Maximum Points: 100

<div>
    <table>
        <tr><td><h3>Sr. no.</h3></td> <td><h3>Section</h3></td> <td><h3>Points</h3></td> </tr>
        <tr><td><h3>1</h3></td> <td><h3>Plot Ground Truth Bounding Boxes</h3></td> <td><h3>20</h3></td> </tr>
        <tr><td><h3>2</h3></td> <td><h3>Training</h3></td> <td><h3>25</h3></td> </tr>
        <tr><td><h3>3</h3></td> <td><h3>Inference</h3></td> <td><h3>15</h3></td> </tr>
        <tr><td><h3>4</h3></td> <td><h3>COCO Detection Evaluation</h3></td> <td><h3>25</h3></td> </tr>
        <tr><td><h3>5</h3></td> <td><h3>Run Inference on a Video</h3></td> <td><h3>15</h3></td> </tr>
    </table>
</div>


# <font style="color:orange">Project Approach</font>

This project fine tunes a pretrained **[TorchVision RetinaNet](https://pytorch.org/vision/0.8/models.html#retinanet)** model using the **Vehicle Registration Plate** dataset. For brevity, a vehicle registration plate is referred to as a license plate. RetinaNet is introducted in the [Focal Loss for Dense Object Detection](https://www.paperswithcode.com/method/retinanet) article and its architecture is depicted below.

![RetinaNet network architecture](https://www.paperswithcode.com/media/methods/Screen_Shot_2020-06-07_at_4.22.37_PM.png)

# <font style="color:purple">Download the Dataset</font> 

**[Download the Vehicle registration plate](https://www.dropbox.com/s/k81ljpmzy3fgtx9/Dataset.zip?dl=1)**


Download the Vehicle Registration Plate dataset from [here](https://www.dropbox.com/s/k81ljpmzy3fgtx9/Dataset.zip?dl=1) and unzip it. 

We will have the following directory structure:

```
Dataset
├── train
│   └── Vehicle registration plate
│       └── Label
└── validation
    └── Vehicle registration plate
        └── Label
```

Unzipping the file will give you a directory `Dataset`. This directory has two folder `train` and `validation`. Each train and validation folder has `Vehicle registration plate`  folder with `.jpg` images and a folder `Labels`.  `Labels` folder has bounding box data for the images.


For example,
For image: `Dataset/train/Vehicle registration plate/bf4689922cdfd532.jpg`
Label file is  `Dataset/train/Vehicle registration plate/Label/bf4689922cdfd532.txt`

There are one or more lines in each `.txt` file. Each line represents one bounding box.
For example,
```
Vehicle registration plate 385.28 445.15 618.24 514.225
Vehicle registration plate 839.68 266.066462 874.24 289.091462
```

We have a single class detection (`Vehicle registration plate detection`) problem. So bounding box details start from the fourth column in each row.

Representation is in `xmin`, `ymin`, `xmax`, and `ymax` format.

**It has `5308` training and `386` validation dataset.**

Data is downloaded from [Open Images Dataset](https://storage.googleapis.com/openimages/web/index.html)

## <font style="color:orange">Download/Unzip Commands</font>

To install the dataset, run the following commands.

```
wget -O Dataset.zip https://www.dropbox.com/s/k81ljpmzy3fgtx9/Dataset.zip?dl=1
unzip Dataset.zip
rm Dataset.zip
mv Dataset/ dataset/
```

The output should be similar to the following.


```
--2021-02-10 19:58:37--  https://www.dropbox.com/s/k81ljpmzy3fgtx9/Dataset.zip?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.3.18, 2620:100:6018:18::a27d:312
Connecting to www.dropbox.com (www.dropbox.com)|162.125.3.18|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
...
HTTP request sent, awaiting response... 200 OK
Length: 1764437533 (1.6G) [application/binary]
Saving to: ‘Dataset.zip’

Dataset.zip         100%[===================>]   1.64G  25.2MB/s    in 45s     

2021-02-10 19:59:24 (37.2 MB/s) - ‘Dataset.zip’ saved [1764437533/1764437533]

Archive:  Dataset.zip
   creating: Dataset/
   creating: Dataset/validation/
   creating: Dataset/validation/Vehicle registration plate/
  inflating: Dataset/validation/Vehicle registration plate/52ceb1fc30b413e5.jpg  
  inflating: Dataset/validation/Vehicle registration plate/182268e1f8c6525f.jpg
  ...
```

In [None]:
#! wget -O Dataset.zip https://www.dropbox.com/s/k81ljpmzy3fgtx9/Dataset.zip?dl=1
#! unzip Dataset.zip
#! rm Dataset.zip
#! mv Dataset/ dataset/

In [None]:
#import collections
#import json
import math
import os
#import shutil
#import sys

import albumentations as A
from albumentations.pytorch import ToTensor, ToTensorV2

import cv2
import matplotlib.pyplot as plt
import numpy as np
import skimage.io
import torch
import torch.optim as optim
import torch.optim.lr_scheduler as sched
import torchvision
#from IPython.utils import io
from PIL import Image 
#from pycocotools.cocoeval import COCOeval
from torch.utils.data import Dataset, DataLoader
from torch.utils.tensorboard import SummaryWriter
#from torchvision import transforms
from tqdm import tqdm


runs_dir = "./runs"       # the root directory for TensorBoard logs
data_dir = "./dataset"    # the root directory for the Vehicle Registration Plate dataset
model_dir = "./models"    # the root directory of models

## <font style="color:orange">Datastore</font>

The `Datastore` class ...

In [None]:
class Datastore(object):
    def __init__(self, data_dir):
        self.__data_dir = data_dir
        self.__train_dir = os.path.join(data_dir, "train")
        self.__train_image_dir = os.path.join(data_dir, "train", "Vehicle registration plate")
        self.__train_label_dir = os.path.join(data_dir, "train", "Vehicle registration plate", "Label")
        self.__valid_dir = os.path.join(data_dir, "validation")
        self.__valid_image_dir = os.path.join(data_dir, "validation", "Vehicle registration plate")
        self.__valid_label_dir = os.path.join(data_dir, "validation", "Vehicle registration plate", "Label")
        if not self.__is_data_set_valid():
            raise IOError(f"'{data_dir}' is not a valid Vehicle Registration Plate dataset.")
        self._train_images = self.__process_data(self.__train_image_dir, self.__train_label_dir)
        self._valid_images = self.__process_data(self.__valid_image_dir, self.__valid_label_dir)

    def __is_data_set_valid(self):
        paths_and_counts = [
            (self.__data_dir, 2),
            (self.__train_dir, 1),
            (self.__train_image_dir, 5309),
            (self.__train_label_dir, 5308),
            (self.__valid_dir, 1),
            (self.__valid_image_dir, 387),
            (self.__valid_label_dir, 386)
        ]
       
        for path, count in paths_and_counts:
            if not os.path.isdir(path) or len(os.listdir(path)) != count:
                return False

        return True

    def __process_annotations(self, ann_path):
        boxes = []
        f = open(ann_path, 'r')
        for line in f.readlines():
            tokens = line.split(' ')
            xmin = float(tokens[3])
            ymin = float(tokens[4])
            xmax = float(tokens[5])
            ymax = float(tokens[6])
            assert xmax > xmin
            assert ymax > ymin
            boxes.append([xmin, ymin, xmax, ymax])
        return boxes
        
    def __create_target(self, image_id, boxes):
        num_objs = len(boxes)
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        return {
            "boxes": boxes, 
            "labels": torch.zeros((num_objs,), dtype=torch.int64), 
            "image_id": torch.tensor([image_id]), 
            "area": (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]), 
            "iscrowd": torch.zeros((num_objs,), dtype=torch.int64)            
        }
        
    def __process_data(self, image_dir, label_dir):
        images = []
        for idx, img_name in enumerate(os.listdir(image_dir)):
            image_path = os.path.join(image_dir, img_name)
            label_path = os.path.join(label_dir, os.path.splitext(img_name)[0] + ".txt")           
            if os.path.isfile(image_path):
                img = Image.open(image_path)
                size = img.size
                img.close()
                del img
                images.append({
                    "id": idx,
                    "path": image_path,
                    "size": size,
                    "target": self.__create_target(idx, self.__process_annotations(label_path))
                })               
        return images
        
    def __get_images(self, images, ids):
        if ids is None:
            return images
        return [images[id] for id in ids]
        
    def get_train_images(self, ids=None):
        return self.__get_images(self._train_images, ids)

    def get_valid_images(self, ids=None):
        return self.__get_images(self._valid_images, ids)

    
datastore = Datastore(data_dir)

## <font style="color:orange">Dataset Analysis</font>

It is useful to understand the dataset's image dimensions when creating transforms. The following cell visualizes the image widths, heights, and aspect ratios as histograms.

In [None]:
def analyze_image_dims():
    # create list of training and validation samples
    samples = []
    samples.extend(datastore.get_train_images())
    samples.extend(datastore.get_valid_images())

    def get_image_dims(sample):
        size = sample["size"]
        return (size[0], size[1], float(size[0])/size[1])    

    image_dims = tuple(zip(*[get_image_dims(sample) for sample in samples]))
    
    def plot_histogram(ax, xlabel, data):
        min = np.min(data)
        max = np.max(data)
        ave = np.mean(data)
        title = f"{xlabel} Histogram (Min: {min:.2f}, Max: {max:.2f}, Ave: {ave:.2f})"
        ax.hist(data, facecolor=(1., 0.549, 0))
        ax.set_xlabel(xlabel)
        ax.set_ylabel("Count")
        ax.set_title(title)

    fig, axs = plt.subplots(3, 1)
    fig.set_figwidth(6)
    fig.set_figheight(9)

    xlabels = ("Width", "Height", "Aspect Ratio")
    for ax, xlabel, data in zip(axs, xlabels, image_dims):
        plot_histogram(ax, xlabel, data)

    fig.tight_layout()
    plt.show()
    
    del samples
    del image_dims


analyze_image_dims()

#  <font style="color:green">1. Plot Ground Truth Bounding Boxes [20 Points]</font> 

**You have to show three images from validation data with the bounding boxes.**

The plotted images should be similar to the following:

<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w10-g1.png'>



<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w10-g2.png'>



<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w10-g3.png'>



## <font style="color:orange">Annotator</font>

The `Annotator` class ...

In [None]:
class Annotator(object):
    @classmethod
    def __annotate_image(cls, img_bgr, bboxes, scores=None, color=(255, 140, 0)):
        scale = 1
        thick = 1
        white = (255, 255, 255)
        for idx, bbox in enumerate(bboxes):
            xminb, yminb, xmaxb, ymaxb = bbox
            xminb = int(xminb + 0.5)
            yminb = int(yminb + 0.5)
            xmaxb = int(xmaxb + 0.5)
            ymaxb = int(ymaxb + 0.5)
            cv2.rectangle(img_bgr, (xminb, yminb), (xmaxb, ymaxb), color, 5)

            text = "lic. plate"
            if scores is not None:
                text = format(scores[idx], ".0%")
            text_size, _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_DUPLEX, scale, thick)

            pad = 8
            xmint = xminb
            ymint = yminb - text_size[1] - pad
            xmaxt = xminb + text_size[0] + pad
            ymaxt = yminb
            xtext = xmint + int(pad / 2)
            ytext = ymint + text_size[1] + int(pad / 2)
            cv2.rectangle(img_bgr, (xmint, ymint), (xmaxt, ymaxt), color, 5)
            cv2.rectangle(img_bgr, (xmint, ymint), (xmaxt, ymaxt), color, -1)
            cv2.putText(img_bgr, text, (xtext, ytext), cv2.FONT_HERSHEY_DUPLEX, scale, white, thick)
        return img_bgr
    
    @classmethod
    def annotate_image_datstore(cls, image, color=(255, 140, 0)):
        img_bgr = cv2.imread(image["path"])
        bboxes = image["target"]["boxes"]
        return cls.__annotate_image(img_bgr, bboxes, None, color)
    
    #@classmethod
    #def annotate_image_bgr_pred(cls, img_bgr, bboxes, scores, color=(255, 140, 0), vis_threshold = 0.4):
    #    mask = scores > vis_threshold
    #   return cls.__annotate_image(img_bgr, bboxes[mask], scores[mask], color)
    
    #@classmethod
    #def annotate_image_name_pred(cls, img_name, bboxes, scores, color=(255, 140, 0), vis_threshold = 0.4):
    #    img_bgr = cv2.imread(os.path.join(self.__image_dir, img_name))
    #    return cls.annotate_image_bgr_pred(img_bgr, bboxes, scores, color, vis_threshold)

    @classmethod
    def show_image(cls, img):
        plt.figure(figsize=(12,12))
        plt.imshow(img)
        plt.show()
    

samples = datastore.get_valid_images([4, 5, 6])

In [None]:
Annotator.show_image(Annotator.annotate_image_datstore(samples[0]))

In [None]:
Annotator.show_image(Annotator.annotate_image_datstore(samples[1]))

In [None]:
Annotator.show_image(Annotator.annotate_image_datstore(samples[2]))

## <font style="color:orange">LicensePlateDataset</font>

The ...

In [None]:
class LicensePlateDataset(object):
    def __init__(self, datastore, training=False, subset_size=None):
        self.__validate_params(training, subset_size)

        if self._subset:
            self._images = LicensePlateDataset.__create_subset(
                self._images, 
                subset_size
            )

        self._transforms_A = A.Compose(
            self._get_transform_list(),
            bbox_params=A.BboxParams(format="pascal_voc", label_fields=["labels"])
        )

        self._transforms = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])
        
    def __getitem_A__(self, idx):
        image = self._images[idx]
        target = image["target"]

        transformed = self._transforms(
            image = skimage.io.imread(image["path"]),
            bboxes = target["boxes"],
            labels = target["labels"]
        )
        
        #bboxes = torch.round(transformed["bboxes"]).type(torch.IntTensor)
        bboxes = transformed["bboxes"]
        labels = transformed["labels"]
        target = {"boxes": bboxes, "labels": labels}
        return transformed["image"], target

    def __getitem__(self, idx):
        image = self._images[idx]
        target = image["target"]

        image = self._transforms(skimage.io.imread(image["path"]))
        
        target = {"boxes": target["boxes"], "labels": target["labels"]}
        return image, target

    def __len__(self):
        return len(self._images)

    def __validate_params(self, training, subset_size):
        self._training = bool(training)
        if self._training:
            self._images = datastore.get_train_images()
        else:
            self._images = datastore.get_valid_images()
        
        self._subset = subset_size is not None
        if not self._subset:
            self._subset_size = None
        else:
            self._subset_size = float(subset_size)
            if self._subset_size <= 0. or self._subset_size > 1.:
                raise ValueError("subset_size must be a float whose value is in (0., 1.]")
    
    def _get_transform_list(self):
        transforms = []
        if self._training and not self._subset:
            # ToDo: Implement!
            pass
        transforms.append(ToTensor())
        return transforms
    
    @classmethod
    def __create_subset(cls, images, subset_size):
        indices = range(len(images))
        _, indices = np.unique([int(idx * subset_size) for idx in indices], return_index=True)
        return [images[idx] for idx in indices]           

## <font style="color:orange">Model Evaluator</font>

[Monk](https://github.com/Tessellate-Imaging/Monk_Object_Detection) supplies an `evaluate_coco(dataset, model, threshold=0.05)` function, which evaluates the model against the validation data and prints the COCO metrics. Unfortunately, this function does not return the COCO metrics, so the mAP metric cannot be logged to TensorBoard. Consequently, I modified the aforementioned function to create the `ModelEvaluator` class, which provides the following features.

* Suppresses the (print) output from the `pycocotools.cocoeval.COCOeval` support class.
* Returns a tuple of the COCO metrics and its primary metric, mean average precision (mAP).

In [None]:
class ModelEvaluator(object):
    def __init__(self, model, dataset):
        self._model = model
        self._dataset = dataset
        
    def Evaluate(self, results_path=None, threshold=0.05):
        if results_path is None:
            results_path = f"{self._dataset.set_name}_results.json"

        self._model.eval()
        with torch.no_grad():
            results = []
            image_ids = []
            data_len = len(self._dataset)
            bar_format = "{l_bar}{bar}| {n_fmt:>4}/{total_fmt:4} [{elapsed}<{remaining}, {rate_fmt}{postfix}]"
            tqdm_pbar = tqdm(desc="CoCoEval", bar_format=bar_format, total=data_len+1, mininterval=1., unit="image")
            for data_idx, data in enumerate(self._dataset):
                tqdm_pbar.update(1)
                
                # run network
                scores, labels, bboxes = self._model(data['img'].permute(2, 0, 1).cuda().float().unsqueeze(dim=0))
                scores = scores.cpu()
                labels = labels.cpu()
                bboxes = bboxes.cpu()
                
                if bboxes.shape[0] > 0:
                    # correct boxes for image scale and change to COCO format
                    bboxes /= data["scale"]
                    bboxes[:, 2] -= bboxes[:, 0]
                    bboxes[:, 3] -= bboxes[:, 1]

                    for bbox_idx in range(bboxes.shape[0]):
                        score = float(scores[bbox_idx])
                        label = int(labels[bbox_idx])
                        bbox  = bboxes[bbox_idx, :]

                        # since scores are sorted, we can jump out of this loop
                        # once we reach a score that is smaller than the threshold
                        if score < threshold:
                            break

                        results.append({
                            "image_id"    : self._dataset.image_ids[data_idx],
                            "category_id" : self._dataset.label_to_coco_label(label),
                            "score"       : float(score),
                            "bbox"        : bbox.tolist(),
                        })

                # append image to list of processed images
                image_ids.append(self._dataset.image_ids[data_idx])

            mean_ap = 0.
            coco_eval = None
            if len(results) > 0:
                with io.capture_output() as captured: # surpress output
                    # write output
                    json.dump(results, open(results_path, 'w'), indent=4)

                    # load results in COCO evaluation tool
                    coco_true = self._dataset.coco
                    coco_pred = coco_true.loadRes(results_path) # why write, then read?

                    # run COCO evaluation
                    coco_eval = COCOeval(coco_true, coco_pred, "bbox")
                    coco_eval.params.imgIds = image_ids
                    coco_eval.evaluate()
                    coco_eval.accumulate()
                    coco_eval.summarize()
                    mean_ap = coco_eval.stats[0].item()
            
            tqdm_pbar.set_postfix(mAP=f"{mean_ap:.03f}")
            tqdm_pbar.update(1)
            tqdm_pbar.close()
            return coco_eval, mean_ap

#  <font style="color:green">2. Training [25 Points]</font> 

- **Write your training code in this section.**

- **You also have to share ([shared logs example](https://tensorboard.dev/experiment/JRtnsKbwTaq1ow6nPLPGeg)) the loss plot of your training using tensorboard.dev.** 

How to share TensorBoard logs using tensorboard.dev find [here](https://courses.opencv.org/courses/course-v1:OpenCV+OpenCV-106+2019_T1/courseware/b1c43ffe765246658e537109e188addb/d62572ec8bd344db9aeae81235ede618/4?activate_block_id=block-v1%3AOpenCV%2BOpenCV-106%2B2019_T1%2Btype%40vertical%2Bblock%40398b46ddcd5c465fa52cb4d572ba3229).

## <font style="color:orange">"Refactoring" Monk Detector Class (via Inheritance)</font>

The Monk `Detector` class' `Train` method prints training progress to the cell's output window. This project requires the use of [TensorBoard](https://www.tensorflow.org/tensorboard/). Rather than refactor the Monk framework, I derived a new detector class, `MyDetector`, from Monk's `Detector` class and that adds the following features.

* Progress is displayed using `tqdm` progress bars.
* Optimizer and LR scheduler may be specified (using `Adam` and `ReduceLROnPlateau`).
* Average loss is computed on the validation data each epoch.
* The model is saved each time the validation loss is minimized.
* Training terminates after N consecutive epochs where the validation loss increases.
* The following parameters and metrics are logged to TensorBoard.
  * learning rate
  * training class loss
  * training regression loss
  * training total loss
  * validation class loss
  * validation regression loss
  * validation total loss
  * validation mAP

Monk's data augmentation for RetinaNet is limited. It randomly flips the training data horizontally. If I am not able to achieve the mAP goal of 0.50 with a simpler RetinaNet model that does not overfit the training data, then I will have to implement and/or integrate additional augmentation transforms.

Prior to the training loop, the validation loss is computed on the pretrained model. The following operations are performed each epoch.

* Training cycle that computes losses on the training data and performs back propogation.
* Validation cycle that computes losses on the validation data.
* Evaluation cycle that computes mAP on the validation data.
* Scheduler update.
* Training termination check.

Since Monk does not support data subsets, the training pipeline was validated by temporary code that prematurely terminating the training and validation cycles. The `ReduceLROnPlateau` scheduler and the training termination code was tested by temporary code that simulated overfitting by monotonically increasing the validation loss.

In [None]:
#! nvidia-smi
! pip list
#! pip install numpy
#! pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html

In [None]:
device = torch.device("cuda")
dataset = LicensePlateDataset(datastore, False)
model = torchvision.models.detection.retinanet_resnet50_fpn(pretrained=True)
model.to(device)
image, target = dataset[0]

#training
images = [image]
targets = [target]
images = [image.to(device) for image in images]
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
for image in images:
    print(f"image: {image.type()}")
for target in targets:
    print(f"boxes: {target['boxes'].type()}, {target['boxes']}")
    print(f"labels: {target['labels'].type()}, {target['labels']}")


optimizer = optim.Adam(model.parameters(), lr=0.0001)
scheduler = sched.ReduceLROnPlateau(optimizer, factor=math.sqrt(0.1), patience=3, threshold=0.001)
model.train()
optimizer.zero_grad()
loss_dict = model(image, target)
print("\nTraining output")
print(loss_dict.keys())

# inferencing
images = [image]
targets = [target]
images = [image.to(device) for image in images]
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
model.eval()
preds = model(images)
print("\nInferencing output")
print(preds[0].keys())

In [None]:
class Inferencer(object):
    # ToDo: Implement the following:
    # * Compute mAP and other CoCo metrics
    # * Interfence a datastore image

    # output of model(images) is a list of dictionaries with the following keys.
    # * 'boxes'
    # * 'scores'
    # * 'labels'


    def __init__(self, datastore, load_model_name, batch_size=8, num_workers=8, subset_size=None):
        # https://pytorch.org/text/stable/datasets.html
        # https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
        self._device = torch.device("cuda" if torch.cuda.is_available() else "cpu")       
        self._model = self._create_model(load_model_name)
        self._valid_dataset = LicensePlateDataset(datastore, False, subset_size)
        self._valid_dataloader = DataLoader(
            dataset = self._valid_dataset,
            batch_size = batch_size,
            num_workers = num_workers,
            shuffle = False
        )

    def _create_model(self, load_model_name):
        # ToDo: Implement
        pretrained = load_model_name is None
        model = torchvision.models.detection.retinanet_resnet50_fpn(
            num_classes = 1, 
            pretrained_backbone = pretrained
        )
        if not pretrained:
            # https://pytorch.org/docs/stable/generated/torch.load.html
            model_path = os.path.join(model_dir, load_model_name)
            model.load_state_dict(torch.load(model_path))
        model.to(self._device)
        return model
    


class Detector(Inferencer):
    # ToDo: Implement the following:
    # * Train method
    
    def __init__(self, datastore, load_model_name=None, batch_size=8, num_workers=8, subset_size=None):
        super().__init__(datastore, load_model_name, batch_size, num_workers, subset_size)
        self._optimizer = None
        self._scheduler = None
        self._train_dataset = LicensePlateDataset(datastore, True, subset_size)
        self._train_dataloader = DataLoader(
            dataset = self._train_dataset,
            batch_size = batch_size,
            num_workers = num_workers,
            shuffle = True
        )

    @property
    def model_parameters(self):
        return self._model.parameters()
        
    @property
    def optimizer(self):
        return self._optimizer
    
    @optimizer.setter
    def optimizer(self, value):
        self._optimizer = value
    
    @property
    def scheduler(self):
        return self._scheduler
    
    @scheduler.setter
    def scheduler(self, value):
        self._scheduler = value
    
    def train(
        self, 
        exper_name, 
        num_epochs, 
        start_epoch=0,
        no_best_thd=0
    ):
        model = self._model
        device = self._device
        optimizer = self._optimizer
        scheduler = self._scheduler
        train_dataloader = self._train_dataloader
        valid_dataloader = self._valid_dataloader

        def create_tqdm_iter(desc, dataloader):
            bar_format = "{l_bar}{bar}| {n_fmt:>4}/{total_fmt:4} [{elapsed}<{remaining}, {rate_fmt}{postfix}]"
            return tqdm(dataloader, bar_format=bar_format, desc=desc, mininterval=1., unit="batch")

        def train_cycle(tqdm_desc, dataloader):
            model.train()

            cstats = self.SimpleStatistics()
            rstats = self.SimpleStatistics()
            tstats = self.SimpleStatistics()
            tqdm_dataloader = create_tqdm_iter(tqdm_desc, dataloader)
            for iter_num, (images, targets) in enumerate(tqdm_dataloader):
                optimizer.zero_grad()
                output = model(images.to(device), targets.to(device))
                closs = output["classification"]
                rloss = output["bbox_regression"]
                
                closs = closs.mean()
                rloss = rloss.mean()
                tloss = closs + rloss

                tloss.backward()
                torch.nn.utils.clip_grad_norm_(model.parameters(), 0.1)

                optimizer.step()

                cstats.add(float(closs))
                rstats.add(float(rloss))
                tstats.add(float(tloss))
                loss_text = f"{{C:{cstats.mean:.3f}, R:{rstats.mean:.3f}, T:{tstats.mean:.3f}}}"
                tqdm_dataloader.set_postfix(losses=loss_text)

                del closs
                del rloss

            tqdm_dataloader.close()
            return cstats.mean, rstats.mean, tstats.mean

        def valid_cycle(tqdm_desc, dataloader):
            # Note: If model is set to evaluation mode, then it returns scores, labels and bboxes.
            #       We want classification and regression loss, so we set it to training mode and
            #       use torch.no_grad() to avoid gradient computation.
            model.train()

            cstats = self.SimpleStatistics()
            rstats = self.SimpleStatistics()
            tstats = self.SimpleStatistics()
            tqdm_dataloader = create_tqdm_iter(tqdm_desc, dataloader)
            for iter_num, data in enumerate(tqdm_dataloader):
                with torch.no_grad():        
                    output = model(images.to(device), targets.to(device))
                    closs = output["classification"]
                    rloss = output["bbox_regression"]

                closs = closs.mean()
                rloss = rloss.mean()
                tloss = closs + rloss

                cstats.add(float(closs))
                rstats.add(float(rloss))
                tstats.add(float(tloss))
                loss_text = f"{{C:{cstats.mean:.3f}, R:{rstats.mean:.3f}, T:{tstats.mean:.3f}}}"
                tqdm_dataloader.set_postfix(losses=loss_text)

                del closs
                del rloss

            tqdm_dataloader.close()
            return cstats.mean, rstats.mean, tstats.mean        
            
        min_valid_loss = 1e10
        epochs_since_best = 0
        save_model_name = exper_name + ".pt"
        writer = SummaryWriter(log_dir=os.path.join(runs_dir, exper_name))
        for epoch_num in range(start_epoch, num_epochs):

            train_closs, train_rloss, train_tloss = train_cycle(
                f"Train {epoch_num:02d}", 
                train_dataloader
            )
            writer.add_scalar("training/class_loss", train_closs, epoch_num)
            writer.add_scalar("training/regression_loss", train_rloss, epoch_num)
            writer.add_scalar("training/total_loss", train_tloss, epoch_num)
            
            valid_closs, valid_rloss, valid_tloss = valid_cycle(
                f"Valid {epoch_num:02d}", 
                valid_dataloader
            )
            writer.add_scalar("validation/class_loss", valid_closs, epoch_num)
            writer.add_scalar("validation/regression_loss", valid_rloss, epoch_num)
            writer.add_scalar("validation/total_loss", valid_tloss, epoch_num)
            
            coco_eval, mean_ap = self.ModelEvaluator.Evaluate()
            writer.add_scalar("test/mAP", mean_ap, epoch_num)

            writer.add_scalar("learning_rate", optimizer.param_groups[0]['lr'], epoch_num)
            
            if isinstance(scheduler, sched.ReduceLROnPlateau):
                scheduler.step(valid_loss)
            else:
                scheduler.step()

            if min_valid_loss > valid_loss:
                epochs_since_best = 0
                min_valid_loss = valid_loss
                self._save_model(save_model_name)
            else:
                epochs_since_best += 1
                if no_best_thd is not None and epochs_since_best >= no_best_thd:
                    break
        
        writer.close()
        model.eval()
    
    def _save_model(save_model_name):
        # https://pytorch.org/docs/stable/generated/torch.save.html
        model_path = os.path.join(model_dir, save_model_name)
        self._model.to(torch.device("cpu"))
        torch.save(self._model.state_dict(), model_path)        
        self._model.to(self._device)

    class SimpleStatistics(object):
        def __init__(self):
            self._n = 0
            self._total = 0.

        @property
        def n(self):
            return self._n

        @property
        def mean(self):
            if self._n == 0:
                return 0.
            return self._total / self._n

        def add(self, x):
            self._n += 1
            self._total += x

## <font style="color:orange">Training Experiments</font>

Training experiments are conducted via the `conduct_experiment(model_name, exper_name)` function. The `model_name` parameter may be one of the following values. The `exper_name` parameter is the experiment name. The name of the saved model is the experiment name with a .pt extension.

* resnet18
* resnet34
* resnet50
* resnet101
* resnet152

TQDM progress bars are used to display training progress. The following output is the first three epochs of training the resnet18 model.

```
Valid 00: 100%|██████████|   39/39   [00:23<00:00,  1.66batch/s, losses={C:1.034, R:0.979, T:2.012}]
Train 01: 100%|██████████|  531/531  [12:12<00:00,  1.38s/batch, losses={C:0.315, R:0.508, T:0.823}]
Valid 01: 100%|██████████|   39/39   [00:23<00:00,  1.69batch/s, losses={C:0.290, R:0.426, T:0.716}]
CoCoEval: 100%|██████████|  387/387  [01:04<00:00,  5.99image/s, mAP=0.352]
Train 02: 100%|██████████|  531/531  [12:12<00:00,  1.38s/batch, losses={C:0.174, R:0.333, T:0.507}]
Valid 02: 100%|██████████|   39/39   [00:23<00:00,  1.67batch/s, losses={C:0.181, R:0.328, T:0.509}]
CoCoEval: 100%|██████████|  387/387  [01:04<00:00,  5.97image/s, mAP=0.431]
Train 03: 100%|██████████|  531/531  [12:12<00:00,  1.38s/batch, losses={C:0.132, R:0.282, T:0.415}]
Valid 03: 100%|██████████|   39/39   [00:22<00:00,  1.72batch/s, losses={C:0.203, R:0.298, T:0.501}]
CoCoEval: 100%|██████████|  387/387  [01:04<00:00,  5.98image/s, mAP=0.447]
```

The TensorBoard logs may be viewed by clicking this [link]().

In [None]:
def conduct_experiment(exper_name):
    detector = Detector(datastore)
    detector.optimizer = optim.Adam(detector.model_parameters, lr=0.0001)
    detector.scheduler = sched.ReduceLROnPlateau(detector.optimizer, factor=math.sqrt(0.1), patience=3, threshold=0.001)
    detector.train(num_epochs=1, exper_name=exper_name)   
    del detector
    torch.cuda.empty_cache()


# uncomment the following lines to run an experiment
conduct_experiment("RetinaNet")

#  <font style="color:green">3. Inference [15 Points]</font> 

**You have to make predictions from your trained model on three images from the validation dataset.**

The plotted images should be similar to the following:

<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w10-p1.png'>



<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w10-p2.png'>



<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w10-p3.png'>




## <font style="color:orange">"Refactoring" Monk Inference Class (via Inheritance)</font>

The Monk `Infer` class' Predict method only accepts a path to an image. It also does not annotate images as nicely as my `Annotator`.  Rather than refactor the Monk framework, I derived a new inference class, `MyInferencer`, from Monk's `Infer` class that adds the following features.

* Predictions may be performed on RBG and BGR in-memory images.
* Predictions and result annotation may be performed on validation samples.

In [None]:
class MyInferencer(Infer):
    def __init__(self, image_dir, annotator, verbose=0):
        super().__init__(verbose)
        self.system_dict["dataset"] = {};
        self.system_dict["dataset"]["val"] = {};
        self.__image_dir = image_dir
        self.__annotator = annotator
        self.__model_evaluator = None
        
    @property
    def ModelEvaluator(self):
        if self.__model_evaluator is None:
            self.__model_evaluator = ModelEvaluator(
                self.system_dict["local"]["model"],
                self.system_dict["local"]["dataset_val"]
            )
        return self.__model_evaluator

    def Val_Dataset(self, root_dir, coco_dir, img_dir, set_dir):
        self.system_dict["dataset"]["val"]["status"] = True;
        self.system_dict["dataset"]["val"]["root_dir"] = root_dir;
        self.system_dict["dataset"]["val"]["coco_dir"] = coco_dir;
        self.system_dict["dataset"]["val"]["img_dir"] = img_dir;
        self.system_dict["dataset"]["val"]["set_dir"] = set_dir;  

        self.system_dict["local"]["dataset_val"] = CocoDataset(
            self.system_dict["dataset"]["val"]["root_dir"] + "/" + self.system_dict["dataset"]["val"]["coco_dir"], 
            img_dir = self.system_dict["dataset"]["val"]["img_dir"], 
            set_dir = self.system_dict["dataset"]["val"]["set_dir"],
            transform = transforms.Compose([Normalizer(), Resizer()])
        )

    def PredictViaImageRGB(self, img):
        # normalize image
        image = img.astype(np.float32) / 255.;
        image = (image.astype(np.float32) - self.system_dict["local"]["mean"]) / self.system_dict["local"]["std"];
       
        # compute scaling factor
        rows, cols, cns = image.shape
        smallest_side = min(rows, cols)
        scale = self.system_dict["local"]["min_side"] / smallest_side
        largest_side = max(rows, cols)
        if largest_side * scale > self.system_dict["local"]["max_side"]:
            scale = self.system_dict["local"]["max_side"]  / largest_side

        # resize and pad the image
        image = skimage.transform.resize(image, (int(round(rows*scale)), int(round((cols*scale)))))
        rows, cols, cns = image.shape
        pad_w = 32 - rows%32
        pad_h = 32 - cols%32
        new_image = np.zeros((rows + pad_w, cols + pad_h, cns)).astype(np.float32)
        new_image[:rows, :cols, :] = image.astype(np.float32)

        # convert image to tensor and perform prediction
        img = torch.from_numpy(new_image)
        with torch.no_grad():
            model = self.system_dict["local"]["model"]
            scores, labels, boxes = model(img.cuda().permute(2, 0, 1).float().unsqueeze(dim=0))
            boxes /= scale

        return scores, labels, boxes
        
    def PredictViaImageBGR(self, img):
        return self.PredictViaImageRGB(img[:, :, ::-1])
        
    def PredictViaImageName(self, img_name):
        img_path = os.path.join(self.__image_dir, img_name)
        return self.PredictViaImageRGB(skimage.io.imread(img_path))
    
    def PredictAndAnnotateSample(self, sample, annotator, vis_threshold=0.4):
        img, _ = sample
        img_name = img["file_name"]
        scores, _, bboxes = self.PredictViaImageName(img_name)
        return self.__annotator.annotate_image_name_pred(
            img_name = img_name, 
            bboxes = bboxes, 
            scores = scores, 
            vis_threshold = vis_threshold)

## <font style="color:orange">Detecting and Annotating License Plates in Sample Validation Images</font>

The following cell creates an "inferencer", loads the trained model, and annotates predictions on the same three samples used in Section 1 to plot ground truth bounding boxes. The subsequent three cells perform a prediction on each sample and annotate the detected license plates.

In [None]:
exper_name = "RetinaNet18"
inferencer = MyInferencer(dataset_helper.valid_image_dir, Annotator(dataset_helper.valid_image_dir))
inferencer.Model(model_path=f"{exper_name}.pt")
samples = dataset_helper.get_valid_samples([4, 5, 6])

In [None]:
Annotator.show_image(inferencer.PredictAndAnnotateSample(samples[0], annotator))

In [None]:
Annotator.show_image(inferencer.PredictAndAnnotateSample(samples[1], annotator))

In [None]:
Annotator.show_image(inferencer.PredictAndAnnotateSample(samples[2], annotator))

#  <font style="color:green">4. COCO Detection Evaluation [25 Points]</font> 

**You have to evaluate your detection model on COCO detection evaluation metric.**

For your reference here is the coco evaluation metric chart:


---

<img src="https://www.learnopencv.com/wp-content/uploads/2020/03/c3-w9-coco_metric.png">

---

#### <font style="color:red">The expected `AP` (primary challenge metric) is more than `0.5`.</font>

**The expected output should look similar to the following:**

```
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.550
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.886
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.629
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.256
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.653
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.627
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.504
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.629
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.633
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.380
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.722
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.704
```



## <font style="color:orange">Detection Model Evaluation</font>

The following cell initializes the inferencer's validation dataset and requests its `ModelEvaluator`. It then calls the model evaluator's `Evaluate` method and summarizes the resulting CoCo evaluation.

In [None]:
with io.capture_output() as captured: # surpress output
    inferencer.Val_Dataset(
        root_dir = data_dir,          # path to root directory containing coco_dir
        coco_dir = "license_plates",  # name of coco_dir contianing image and annotation folders
        img_dir = "./",               # name of folder containing all training and validation folders
        set_dir = "images_valid"
    )

evaluator = inferencer.ModelEvaluator
coco_eval, _ = evaluator.Evaluate()   
coco_eval.summarize()

# <font style="color:green">5. Run Inference on a Video [15 Points]</font>

#### [Download the Input Video](https://www.dropbox.com/s/g88o4dx18zpgn8k/projet3-input-video.mp4?dl=1)

**You have to run inference on a video.** 

You can download the video from [here](https://www.dropbox.com/s/g88o4dx18zpgn8k/projet3-input-video.mp4?dl=1).

#### <font style="color:red">Upload the output video on youtube and share the link. Do not upload the video in the lab.</font>

In [None]:
from IPython.display import YouTubeVideo, display
video = YouTubeVideo("18HWHCevFdU", width=640, height=360)
display(video)

**Your output video should have a bounding box around the vehicle registration plate.**

In [None]:
video = YouTubeVideo("5SgCuee7AMs", width=640, height=360)
display(video)

## <font style="color:orange">Detection and Annotating a Video</font>

I modified the sample function to read the source video frame-by-frame, detect and annotate license plates in each frame, and create an output video. I uploaded this video to YouTube and it may be viewed in this notebook's last cell.

In [None]:
def annotate_video(src_video_path, dst_video_path, frames_dir=None):
    """
    Create a new video by annotating license plates in the source video.
    
    src_video_path (str): path of source video in which to annotate
    dst_video_path (str): path to destination video in which to create
    frames_dir optional(str): if specified, direction to write annotated frames
    """
    
    # create a video reader
    reader = cv2.VideoCapture(src_video_path)
    if not reader.isOpened(): 
        print("Error opening video file")
        return
        
    # get source video attributes
    width = int(reader.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(reader.get(cv2.CAP_PROP_FRAME_HEIGHT))
    frames_per_second = float(reader.get(cv2.CAP_PROP_FPS))
    
    # create a video writer
    writer = cv2.VideoWriter(
        filename = dst_video_path,
        fourcc = cv2.VideoWriter_fourcc(*"mp4v"),
        fps =  frames_per_second,
        frameSize = (width, height),
        isColor=True,
    )
    
    # for each frame in the source video ...
    count = 0
    while reader.isOpened():
        # read the frame
        ret, frame = reader.read()
        if ret:
            # detect and annotate license plates
            scores, _, bboxes = inferencer.PredictViaImageBGR(frame)
            frame = annotator.annotate_image_bgr_pred(frame, bboxes, scores, color=(0, 140, 255))
            # write frame to destination video
            writer.write(frame)
            # if specified, write frame to frames directory
            if frames_dir is not None:
                name = f"frame{count:04d}.jpg"
                path = os.path.join(frames_dir, name)
                cv2.imwrite(path, frame)
            count += 1
        else:
            break
    
    # free reader and writer resources
    reader.release()
    writer.release()
    
    return

In [None]:
# ! wget -O project3-input-video.mp4 https://www.dropbox.com/s/g88o4dx18zpgn8k/projet3-input-video.mp4?dl=1
annotate_video("project3-input-video.mp4", "project3-output-video.mp4", "./Frames")

In [None]:
#video = YouTubeVideo("", width=640, height=360)
#display(video)

## <font style="color:orange">Conclusion</font>

The RetinaNet model with the Feature Pyramid Network backbone on top of a feedforward ResNet-## architecture achieved a mAP score of 0.5#.

The utility of the [Monk](https://github.com/Tessellate-Imaging/Monk_Object_Detection) framework was questionable. Several blog posts advertise that they build a custom object detector in 5 lines of code. Technically, this is possible. However, one is at the mercy of the framework creators. However, for this project, I rewrote most of the training, validation, testing, and visualization code. Fortunately, I did not have to modify the RetinaNet implementation. Monk has potential, but it needs significant refactoring to allow basic customization, e.g., specifying an optimizer and scheduler, computing the average loss on the validation set to ascertain whether the model is overfitting, etc.