# <font style="color:blue">Project 3: Object Detection</font>


#### Maximum Points: 100

<div>
    <table>
        <tr><td><h3>Sr. no.</h3></td> <td><h3>Section</h3></td> <td><h3>Points</h3></td> </tr>
        <tr><td><h3>1</h3></td> <td><h3>Plot Ground Truth Bounding Boxes</h3></td> <td><h3>20</h3></td> </tr>
        <tr><td><h3>2</h3></td> <td><h3>Training</h3></td> <td><h3>25</h3></td> </tr>
        <tr><td><h3>3</h3></td> <td><h3>Inference</h3></td> <td><h3>15</h3></td> </tr>
        <tr><td><h3>4</h3></td> <td><h3>COCO Detection Evaluation</h3></td> <td><h3>25</h3></td> </tr>
        <tr><td><h3>5</h3></td> <td><h3>Run Inference on a Video</h3></td> <td><h3>15</h3></td> </tr>
    </table>
</div>


# <font style="color:orange">Project Approach</font>

In the last project, I wrote a framework, which allowed me to quickly create classification experiments. My motivation was not only to fulfill the project's requirements, but also to learn how to program in the Python language. In this project, I want to use an existing framework to minimize the amount of code I write. I decided to use [Monk](https://github.com/Tessellate-Imaging/Monk_Object_Detection)'s [RetinaNet] implementation (https://www.paperswithcode.com/method/retinanet), which is built upon [PyTorch RetinaNet](https://github.com/yhenon/pytorch-retinanet).

![RetinaNet network architecture](https://www.paperswithcode.com/media/methods/Screen_Shot_2020-06-07_at_4.22.37_PM.png)

# <font style="color:orange">Monk Installation</font>

To install Monk, run the following commands.

```
git clone https://github.com/Tessellate-Imaging/Monk_Object_Detection.git
cd Monk_Object_Detection/5_pytorch_retinanet/installation && cat requirements_brule.txt | xargs -n 1 -L 1 pip install
```

The output should be similar to the following.

```
Cloning into 'Monk_Object_Detection'...
remote: Enumerating objects: 10525, done.
remote: Total 10525 (delta 0), reused 0 (delta 0), pack-reused 10525
Receiving objects: 100% (10525/10525), 260.87 MiB | 539.00 KiB/s, done.
Resolving deltas: 100% (4692/4692), done.
Checking out files: 100% (8428/8428), done.
Collecting cython
  Downloading Cython-0.29.21-cp37-cp37m-manylinux1_x86_64.whl (2.0 MB)
     |████████████████████████████████| 2.0 MB 17.8 MB/s eta 0:00:01
Installing collected packages: cython
Successfully installed cython-0.29.21
Collecting efficientnet_pytorch
  Downloading efficientnet_pytorch-0.7.0.tar.gz (20 kB)
...
```

**Note:** `requirements.txt` is outdated and this caused a conflict between PyTorch and TorchVision. I created `requirements_brule.txt` to install modules are not installed on my _Deep Learning with PyTorch_ Docker image. I suspect `requirements_colab.txt` is also outdated.

In [None]:
#! git clone https://github.com/Tessellate-Imaging/Monk_Object_Detection.git
#! cd Monk_Object_Detection/5_pytorch_retinanet/installation && cat requirements_brule.txt | xargs -n 1 -L 1 pip install

# <font style="color:purple">Download the Dataset</font> 

**[Download the Vehicle registration plate](https://www.dropbox.com/s/k81ljpmzy3fgtx9/Dataset.zip?dl=1)**


Download the Vehicle Registration Plate dataset from [here](https://www.dropbox.com/s/k81ljpmzy3fgtx9/Dataset.zip?dl=1) and unzip it. 

We will have the following directory structure:

```
Dataset
├── train
│   └── Vehicle registration plate
│       └── Label
└── validation
    └── Vehicle registration plate
        └── Label
```

Unzipping the file will give you a directory `Dataset`. This directory has two folder `train` and `validation`. Each train and validation folder has `Vehicle registration plate`  folder with `.jpg` images and a folder `Labels`.  `Labels` folder has bounding box data for the images.


For example,
For image: `Dataset/train/Vehicle registration plate/bf4689922cdfd532.jpg`
Label file is  `Dataset/train/Vehicle registration plate/Label/bf4689922cdfd532.txt`

There are one or more lines in each `.txt` file. Each line represents one bounding box.
For example,
```
Vehicle registration plate 385.28 445.15 618.24 514.225
Vehicle registration plate 839.68 266.066462 874.24 289.091462
```

We have a single class detection (`Vehicle registration plate detection`) problem. So bounding box details start from the fourth column in each row.

Representation is in `xmin`, `ymin`, `xmax`, and `ymax` format.

**It has `5308` training and `386` validation dataset.**

Data is downloaded from [Open Images Dataset](https://storage.googleapis.com/openimages/web/index.html)

## <font style="color:orange">Download/Unzip Commands</font>

To install the dataset, run the following commands.

```
wget -O Dataset.zip https://www.dropbox.com/s/k81ljpmzy3fgtx9/Dataset.zip?dl=1
unzip Dataset.zip
rm Dataset.zip
```

The output should be similar to the following.


```
--2021-02-10 19:58:37--  https://www.dropbox.com/s/k81ljpmzy3fgtx9/Dataset.zip?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.3.18, 2620:100:6018:18::a27d:312
Connecting to www.dropbox.com (www.dropbox.com)|162.125.3.18|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
...
HTTP request sent, awaiting response... 200 OK
Length: 1764437533 (1.6G) [application/binary]
Saving to: ‘Dataset.zip’

Dataset.zip         100%[===================>]   1.64G  25.2MB/s    in 45s     

2021-02-10 19:59:24 (37.2 MB/s) - ‘Dataset.zip’ saved [1764437533/1764437533]

Archive:  Dataset.zip
   creating: Dataset/
   creating: Dataset/validation/
   creating: Dataset/validation/Vehicle registration plate/
  inflating: Dataset/validation/Vehicle registration plate/52ceb1fc30b413e5.jpg  
  inflating: Dataset/validation/Vehicle registration plate/182268e1f8c6525f.jpg
  ...
```

In [None]:
#! wget -O Dataset.zip https://www.dropbox.com/s/k81ljpmzy3fgtx9/Dataset.zip?dl=1
#! unzip Dataset.zip
#! rm Dataset.zip

In [None]:
import collections
import json
import math
import os
import shutil
import sys

import cv2
import matplotlib.pyplot as plt
import numpy as np
import skimage
import torch
import torch.optim as optim
import torch.optim.lr_scheduler as sched
from IPython.utils import io       
from pycocotools.cocoeval import COCOeval
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms
from tqdm import tqdm

sys.path.append("Monk_Object_Detection/5_pytorch_retinanet/lib/")
from infer_detector import Infer
from train_detector import Detector
from retinanet.dataloader import CocoDataset, Resizer, Normalizer

## <font style="color:orange">Convert Dataset to COCO Format</font>

[Monk](https://github.com/Tessellate-Imaging/Monk_Object_Detection) supports datasets in two formats: CSV and COCO. The `DatasetHelper` class converts the _Vehicle Registration Plate_ dataset to COCO Format 2.

**Note:** This conversion is destructive. It replaces the original dataset with the following directory structure.

```
                                            Monk Parameters
                                            ------------------------
Dataset                                  →  root_dir
└── license_plates                       →  coco_dir
    ├── images_train                     →  set_dir
    │   ├── 00009e5b390986a0.jpg
    │   ├── 000228608388803f.jpg
    │   └── ........ (and so on)
    ├── images_valid                     →  set_dir
    │   ├── 003a5aaf6d17c917.jpg
    │   ├── 00723dac8201a83e.jpg
    │   └── ........ (and so on)
    └── annotations
        ├── instances_images_train.json  →  instances_<set_dir>.json
        ├── instances_images_valid.json  →  instances_<set_dir>.json
        └── classes.txt
```

In [None]:
runs_dir = "./runs"       # the root directory for TensorBoard logs
data_dir = "./Dataset"    # the root directory for the CoCo dataset


class DatasetHelper(object):
    def __init__(self, data_dir):
        self.__data_dir = data_dir
        self.__orig_train_dir = os.path.join(data_dir, "train")
        self.__orig_train_image_dir = os.path.join(data_dir, "train", "Vehicle registration plate")
        self.__orig_train_label_dir = os.path.join(data_dir, "train", "Vehicle registration plate", "Label")
        self.__orig_valid_dir = os.path.join(data_dir, "validation")
        self.__orig_valid_image_dir = os.path.join(data_dir, "validation", "Vehicle registration plate")
        self.__orig_valid_label_dir = os.path.join(data_dir, "validation", "Vehicle registration plate", "Label")
        self.__coco_annotations_dir = os.path.join(data_dir, "license_plates", "annotations")
        self.__coco_train_image_dir = os.path.join(data_dir, "license_plates", "images_train")
        self.__coco_valid_image_dir = os.path.join(data_dir, "license_plates", "images_valid")
        self.__coco_class_list_path = os.path.join(self.__coco_annotations_dir, "classes.txt")
        self.__coco_train_annotation_path = os.path.join(self.__coco_annotations_dir, "instances_images_train.json")
        self.__coco_valid_annotation_path = os.path.join(self.__coco_annotations_dir, "instances_images_valid.json")

    def __is_data_set_valid(self):
        paths_and_counts = [
            (self.__data_dir, 2),
            (self.__orig_train_dir, 1),
            (self.__orig_train_image_dir, 5309),
            (self.__orig_train_label_dir, 5308),
            (self.__orig_valid_dir, 1),
            (self.__orig_valid_image_dir, 387),
            (self.__orig_valid_label_dir, 386)
        ]
       
        for path, count in paths_and_counts:
            if not os.path.isdir(path) or len(os.listdir(path)) != count:
                return False

        return True

    def __process_image(self, img_name, img_id, img_path):
        img = Image.open(img_path)
        json = {
            "file_name": img_name,
            "height": img.size[1],
            "id": img_id,
            "width": img.size[0]
        }
        del img
        return json

    def __process_annotations(self, img_id, ann_id, ann_path):
        anns = []
        f = open(ann_path, 'r')
        for line in f.readlines():
            tokens = line.split(' ')
            xmin = float(tokens[3])
            ymin = float(tokens[4])
            xmax = float(tokens[5])
            ymax = float(tokens[6])
            assert xmax > xmin
            assert ymax > ymin
            width = abs(xmax - xmin)
            height = abs(ymax - ymin)
            anns.append({
                "area": width * height,
                "bbox": [xmin, ymin, width, height],
                "category_id": 0,
                "id": ann_id,
                "ignore": 0,
                "image_id": img_id,
                "iscrowd": 0,
                "segmentation": []
            })
            ann_id += 1
        f.close()
        return anns
        
    def __process_data(self, orig_image_dir, orig_label_dir, coco_image_dir, coco_annotation_path):
        json_dict = {
            "type": "instances", 
            "images": [], 
            "categories": [{"supercategory": "none", "name": "license plate", "id": 0}],
            "annotations": []
        }
        
        img_id = 0
        ann_id = 0
        for img_name in os.listdir(orig_image_dir):
            orig_image_path = os.path.join(orig_image_dir, img_name)
            orig_label_path = os.path.join(orig_label_dir, os.path.splitext(img_name)[0] + ".txt")
            coco_image_path = os.path.join(coco_image_dir, img_name)
            
            if os.path.isfile(orig_image_path):
                # create image and annotation JSON objects
                img = self.__process_image(img_name, img_id, orig_image_path)
                anns = self.__process_annotations(img_id, ann_id, orig_label_path)
                json_dict["images"].append(img)
                json_dict["annotations"].extend(anns)
                img_id += 1
                ann_id += len(anns)

                # move image file
                if os.path.isfile(orig_image_path):
                   os.rename(orig_image_path, coco_image_path)
        
        # create json annotation file
        f = open(coco_annotation_path, 'w')
        f.write(json.dumps(json_dict))
        f.close()
        
    def __create_class_list(self):
        f = open(self.__coco_class_list_path, 'w')
        f.write("license plate")
        f.close()
    
    def convert_to_coco(self):
        if not self.__is_data_set_valid():
            raise IOError(f"'{data_dir}' is not a valid Vehicle Registration Plate dataset.")

        # create COCO directories
        os.makedirs(self.__coco_annotations_dir)
        os.makedirs(self.__coco_train_image_dir)
        os.makedirs(self.__coco_valid_image_dir)

        # create class list & training/validation annotation files
        self.__create_class_list()
        self.__process_data(
            self.__orig_train_image_dir, 
            self.__orig_train_label_dir, 
            self.__coco_train_image_dir,
            self.__coco_train_annotation_path
        )
        self.__process_data(
            self.__orig_valid_image_dir, 
            self.__orig_valid_label_dir, 
            self.__coco_valid_image_dir,
            self.__coco_valid_annotation_path
        )
        
        # remove original directories
        shutil.rmtree(self.__orig_train_dir)
        shutil.rmtree(self.__orig_valid_dir)

    @property
    def valid_image_dir(self):
        return self.__coco_valid_image_dir
        
    def __get_samples(self, annotation_path, ids):
        f = open(annotation_path, 'r')
        data = json.load(f)
        f.close()
        imgs = [img for img in data["images"] if img["id"] in ids]
        anns = [ann for ann in data["annotations"] if ann["image_id"] in ids]
        return [(img, [ann for ann in anns if ann["image_id"] == img["id"]]) for img in imgs]
            
    def get_train_samples(self, ids):
        return self.__get_samples(self.__coco_train_annotation_path, ids)

    def get_valid_samples(self, ids):
        return self.__get_samples(self.__coco_valid_annotation_path, ids)

dataset_helper = DatasetHelper(data_dir)
try:
    dataset_helper.convert_to_coco()
    print("Converted dataset.")
except:
    print("Dataset already converted.")

## <font style="color:orange">Dataset Analysis</font>

It is useful to understand the dataset's image dimensions when creating transforms. The following cell visualizes the image widths, heights, and aspect ratios as histograms.

In [None]:
def analyze_image_dims():
    # create list of training and validation samples
    samples = []
    samples.extend(dataset_helper.get_train_samples(list(range(5308))))
    samples.extend(dataset_helper.get_valid_samples(list(range(386))))

    def get_image_dims(sample):
        img = sample[0]
        width = img["width"]
        height = img["height"]
        return (width, height, float(width)/height)    

    image_dims = tuple(zip(*[get_image_dims(sample) for sample in samples]))
    
    def plot_histogram(ax, xlabel, data):
        min = np.min(data)
        max = np.max(data)
        ave = np.mean(data)
        title = f"{xlabel} Histogram (Min: {min:.2f}, Max: {max:.2f}, Ave: {ave:.2f})"
        ax.hist(data, facecolor=(1., 0.549, 0))
        ax.set_xlabel(xlabel)
        ax.set_ylabel("Count")
        ax.set_title(title)

    fig, axs = plt.subplots(3, 1)
    fig.set_figwidth(6)
    fig.set_figheight(9)

    xlabels = ("Width", "Height", "Aspect Ratio")
    for ax, xlabel, data in zip(axs, xlabels, image_dims):
        plot_histogram(ax, xlabel, data)

    fig.tight_layout()
    plt.show()
    
    del samples
    del image_dims


analyze_image_dims()

## <font style="color:orange">Dataset and Transform Experimentation</font>

In [None]:
import albumentations as A
from albumentations.pytorch import ToTensorV2

class AlbumentationsCocoDataset(CocoDataset):
    def __getitem__(self, idx):
        img = self.load_image(idx)
        annot = self.load_annotations(idx)
        if self.transform:
            transformed = self.transform(image=img, bboxes=annot)
            img = transformed["image"]
            annot = transformed["bboxes"]
        return {'img': img, 'annot': annot}
    
    def load_image(self, image_index):
        image_info = self.coco.loadImgs(self.image_ids[image_index])[0]
        path = os.path.join(self.root_dir, self.img_dir, self.set_name, image_info['file_name'])
        img = skimage.io.imread(path)

        if len(img.shape) == 2:
            img = skimage.color.gray2rgb(img)

        return img #.astype(np.float32)/255.0

In [None]:
print("CoCoDataset")
dataset1 = CocoDataset(
    root_dir = os.path.join(data_dir, "license_plates"),
    img_dir = "./",
    set_dir = "images_valid",
    transform = transforms.Compose([Normalizer(), Resizer()])
)

sample = dataset1[5]
img, annot = sample['img'], sample['annot']
print()
print(sample.keys())
print(annot, type(annot))
print(img.shape, type(img))

# split into boxes and labels
#boxes = annot[:, :4]
#labels = (annot[:, 4] + 0.5).type(torch.int64)
#print()
#print(boxes)
#print(labels)

# join back into a single tensor
#join = torch.cat((boxes, labels[:, None]), dim=1)
#print()
#print(f"annot: {join}")

print()
print("AlbumentationsCocoDataset")
dataset2 = AlbumentationsCocoDataset(
    root_dir = os.path.join(data_dir, "license_plates"),
    img_dir = "./",
    set_dir = "images_valid",
    transform = A.Compose([
        A.SmallestMaxSize(max_size=608),
        A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
        ToTensorV2()
    ], bbox_params=A.BboxParams(format='pascal_voc'))
)

sample = dataset2[5]
img, annot = sample['img'], sample['annot']
rows, cols, cns = img.shape

print()
print(sample.keys())
print(annot, type(annot))
print(img.shape, type(img))


#  <font style="color:green">1. Plot Ground Truth Bounding Boxes [20 Points]</font> 

**You have to show three images from validation data with the bounding boxes.**

The plotted images should be similar to the following:

<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w10-g1.png'>



<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w10-g2.png'>



<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w10-g3.png'>



## <font style="color:orange">Annotator</font>

[Monk](https://github.com/Tessellate-Imaging/Monk_Object_Detection) is capable of annotating objects. However, its functionality is limited, the annotations are not aesthetically pleasing, and it is incapable of meeting this project's requirements, e.g., annotating in-memory memory frames. Consequently, I wrote my own `Annotator` class.

In [None]:
class Annotator(object):
    def __init__(self, image_dir):
        self.__image_dir = image_dir
        
    def __annotate_image(self, img_bgr, bboxes, scores=None, color=(255, 140, 0)):
        scale = 1
        thick = 1
        white = (255, 255, 255)
        for idx, bbox in enumerate(bboxes):
            xminb, yminb, xmaxb, ymaxb = bbox
            xminb = int(xminb + 0.5)
            yminb = int(yminb + 0.5)
            xmaxb = int(xmaxb + 0.5)
            ymaxb = int(ymaxb + 0.5)
            cv2.rectangle(img_bgr, (xminb, yminb), (xmaxb, ymaxb), color, 5)

            text = "lic. plate"
            if scores is not None:
                text = format(scores[idx], ".0%")
            text_size, _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_DUPLEX, scale, thick)

            pad = 8
            xmint = xminb
            ymint = yminb - text_size[1] - pad
            xmaxt = xminb + text_size[0] + pad
            ymaxt = yminb
            xtext = xmint + int(pad / 2)
            ytext = ymint + text_size[1] + int(pad / 2)
            cv2.rectangle(img_bgr, (xmint, ymint), (xmaxt, ymaxt), color, 5)
            cv2.rectangle(img_bgr, (xmint, ymint), (xmaxt, ymaxt), color, -1)
            cv2.putText(img_bgr, text, (xtext, ytext), cv2.FONT_HERSHEY_DUPLEX, scale, white, thick)
        return img_bgr
    
    def __to_min_max_bbox(self, gt_bbox):
        xmin, ymin, width, height = gt_bbox
        return [xmin, ymin, xmin + width, ymin + height]

    def annotate_sample(self, sample, color=(255, 140, 0)):
        img, anns = sample
        img_bgr = cv2.imread(os.path.join(self.__image_dir, img["file_name"]))
        bboxes = [self.__to_min_max_bbox(ann["bbox"]) for ann in anns]
        return self.__annotate_image(img_bgr, bboxes, None, color)
    
    def annotate_image_bgr_pred(self, img_bgr, bboxes, scores, color=(255, 140, 0), vis_threshold = 0.4):
        mask = scores > vis_threshold
        return self.__annotate_image(img_bgr, bboxes[mask], scores[mask], color)
    
    def annotate_image_name_pred(self, img_name, bboxes, scores, color=(255, 140, 0), vis_threshold = 0.4):
        img_bgr = cv2.imread(os.path.join(self.__image_dir, img_name))
        return self.annotate_image_bgr_pred(img_bgr, bboxes, scores, color, vis_threshold)

    @classmethod
    def show_image(cls, img):
        plt.figure(figsize=(12,12))
        plt.imshow(img)
        plt.show()
    

annotator = Annotator(dataset_helper.valid_image_dir)    
samples = dataset_helper.get_valid_samples([4, 5, 6])

In [None]:
Annotator.show_image(annotator.annotate_sample(samples[0]))

In [None]:
Annotator.show_image(annotator.annotate_sample(samples[1]))

In [None]:
Annotator.show_image(annotator.annotate_sample(samples[2]))

## <font style="color:orange">Model Evaluator</font>

[Monk](https://github.com/Tessellate-Imaging/Monk_Object_Detection) supplies an `evaluate_coco(dataset, model, threshold=0.05)` function, which evaluates the model against the validation data and prints the COCO metrics. Unfortunately, this function does not return the COCO metrics, so the mAP metric cannot be logged to TensorBoard. Consequently, I modified the aforementioned function to create the `ModelEvaluator` class, which provides the following features.

* Suppresses the (print) output from the `pycocotools.cocoeval.COCOeval` support class.
* Returns a tuple of the COCO metrics and its primary metric, mean average precision (mAP).

In [None]:
class ModelEvaluator(object):
    def __init__(self, model, dataset):
        self._model = model
        self._dataset = dataset
        
    def Evaluate(self, results_path=None, threshold=0.05):
        if results_path is None:
            results_path = f"{self._dataset.set_name}_results.json"

        self._model.eval()
        with torch.no_grad():
            results = []
            image_ids = []
            data_len = len(self._dataset)
            bar_format = "{l_bar}{bar}| {n_fmt:>4}/{total_fmt:4} [{elapsed}<{remaining}, {rate_fmt}{postfix}]"
            tqdm_pbar = tqdm(desc="CoCoEval", bar_format=bar_format, total=data_len+1, mininterval=1., unit="image")
            for data_idx, data in enumerate(self._dataset):
                tqdm_pbar.update(1)
                
                # run network
                scores, labels, bboxes = self._model(data['img'].permute(2, 0, 1).cuda().float().unsqueeze(dim=0))
                scores = scores.cpu()
                labels = labels.cpu()
                bboxes = bboxes.cpu()
                
                if bboxes.shape[0] > 0:
                    # correct boxes for image scale and change to COCO format
                    bboxes /= data["scale"]
                    bboxes[:, 2] -= bboxes[:, 0]
                    bboxes[:, 3] -= bboxes[:, 1]

                    for bbox_idx in range(bboxes.shape[0]):
                        score = float(scores[bbox_idx])
                        label = int(labels[bbox_idx])
                        bbox  = bboxes[bbox_idx, :]

                        # since scores are sorted, we can jump out of this loop
                        # once we reach a score that is smaller than the threshold
                        if score < threshold:
                            break

                        results.append({
                            "image_id"    : self._dataset.image_ids[data_idx],
                            "category_id" : self._dataset.label_to_coco_label(label),
                            "score"       : float(score),
                            "bbox"        : bbox.tolist(),
                        })

                # append image to list of processed images
                image_ids.append(self._dataset.image_ids[data_idx])

            mean_ap = 0.
            coco_eval = None
            if len(results) > 0:
                with io.capture_output() as captured: # surpress output
                    # write output
                    json.dump(results, open(results_path, 'w'), indent=4)

                    # load results in COCO evaluation tool
                    coco_true = self._dataset.coco
                    coco_pred = coco_true.loadRes(results_path) # why write, then read?

                    # run COCO evaluation
                    coco_eval = COCOeval(coco_true, coco_pred, "bbox")
                    coco_eval.params.imgIds = image_ids
                    coco_eval.evaluate()
                    coco_eval.accumulate()
                    coco_eval.summarize()
                    mean_ap = coco_eval.stats[0].item()
            
            tqdm_pbar.set_postfix(mAP=f"{mean_ap:.03f}")
            tqdm_pbar.update(1)
            tqdm_pbar.close()
            return coco_eval, mean_ap

#  <font style="color:green">2. Training [25 Points]</font> 

- **Write your training code in this section.**

- **You also have to share ([shared logs example](https://tensorboard.dev/experiment/JRtnsKbwTaq1ow6nPLPGeg)) the loss plot of your training using tensorboard.dev.** 

How to share TensorBoard logs using tensorboard.dev find [here](https://courses.opencv.org/courses/course-v1:OpenCV+OpenCV-106+2019_T1/courseware/b1c43ffe765246658e537109e188addb/d62572ec8bd344db9aeae81235ede618/4?activate_block_id=block-v1%3AOpenCV%2BOpenCV-106%2B2019_T1%2Btype%40vertical%2Bblock%40398b46ddcd5c465fa52cb4d572ba3229).

## <font style="color:orange">"Refactoring" Monk Detector Class (via Inheritance)</font>

The Monk `Detector` class' `Train` method prints training progress to the cell's output window. This project requires the use of [TensorBoard](https://www.tensorflow.org/tensorboard/). Rather than refactor the Monk framework, I derived a new detector class, `MyDetector`, from Monk's `Detector` class and that adds the following features.

* Progress is displayed using `tqdm` progress bars.
* Optimizer and LR scheduler may be specified (using `Adam` and `ReduceLROnPlateau`).
* Average loss is computed on the validation data each epoch.
* The model is saved each time the validation loss is minimized.
* Training terminates after N consecutive epochs where the validation loss increases.
* The following parameters and metrics are logged to TensorBoard.
  * learning rate
  * training class loss
  * training regression loss
  * training total loss
  * validation class loss
  * validation regression loss
  * validation total loss
  * validation mAP

Monk's data augmentation for RetinaNet is limited. It randomly flips the training data horizontally. If I am not able to achieve the mAP goal of 0.50 with a simpler RetinaNet model that does not overfit the training data, then I will have to implement and/or integrate additional augmentation transforms.

Prior to the training loop, the validation loss is computed on the pretrained model. The following operations are performed each epoch.

* Training cycle that computes losses on the training data and performs back propogation.
* Validation cycle that computes losses on the validation data.
* Evaluation cycle that computes mAP on the validation data.
* Scheduler update.
* Training termination check.

Since Monk does not support data subsets, the training pipeline was validated by temporary code that prematurely terminating the training and validation cycles. The `ReduceLROnPlateau` scheduler and the training termination code was tested by temporary code that simulated overfitting by monotonically increasing the validation loss.

In [None]:
class Stats(object):
    def __init__(self):
        self._n = 0
        self._total = 0.

    @property
    def n(self):
        return self._n

    @property
    def mean(self):
        if self._n == 0:
            return 0.
        return self._total / self._n

    def add(self, x):
        self._n += 1
        self._total += x


class MyDetector(Detector):
    def __init__(self, verbose=0):
        super().__init__(verbose)
        self.__model_evaluator = None

    @property
    def ModelParameters(self):
        return self.system_dict["local"]["model"].parameters()
        
    @property
    def ModelEvaluator(self):
        if self.__model_evaluator is None:
            self.__model_evaluator = ModelEvaluator(
                self.system_dict["local"]["model"],
                self.system_dict["local"]["dataset_val"]
            )
        return self.__model_evaluator
        
    def Set_Hyperparams(self, lr=0.0001, val_interval=1, print_interval=20):
        """
        Use SetOptimizerAndScheduler to initialize the optimizer and LR scheduler.
        """
        return
        
    def SetOptimizerAndScheduler(self, optimizer, scheduler):
        self.system_dict["local"]["optimizer"] = optimizer
        self.system_dict["local"]["scheduler"] = scheduler
        
    def Load(self, model_path="final_model.pt"):
        self.system_dict["local"]["model"] = torch.load(model_path)
        if torch.cuda.is_available():
            self.system_dict["local"]["model"] = self.system_dict["local"]["model"].cuda();
        
    def Train(self, num_epochs, exper_name, no_best_thd=10):
        saved_model = exper_name + ".pt"
        log_dir = os.path.join(runs_dir, exper_name)
        self.system_dict["output"]["saved_model"] = saved_model;
        self.system_dict["params"]["num_epochs"] = num_epochs;
        
        model = self.system_dict["local"]["model"]
        device = self.system_dict["local"]["device"]
        optimizer = self.system_dict["local"]["optimizer"]
        scheduler = self.system_dict["local"]["scheduler"]
        dataloader_train = self.system_dict["local"]["dataloader_train"]
        dataloader_valid = self.system_dict["local"]["dataloader_val"]

        writer = SummaryWriter(log_dir=log_dir)

        def TrainCycle(epoch_num):
            train_closs, train_rloss, train_tloss = MyDetector._TrainCycle(
                f"Train {epoch_num:02d}", 
                model, 
                device, 
                dataloader_train, 
                optimizer
            )
            writer.add_scalar("training/class_loss", train_closs, epoch_num)
            writer.add_scalar("training/regression_loss", train_rloss, epoch_num)
            writer.add_scalar("training/total_loss", train_tloss, epoch_num)
            return train_tloss

        def ValidCycle(epoch_num):
            valid_closs, valid_rloss, valid_tloss = MyDetector._ValidCycle(
                f"Valid {epoch_num:02d}", 
                model, 
                device, 
                dataloader_valid
            )
            
            writer.add_scalar("validation/class_loss", valid_closs, epoch_num)
            writer.add_scalar("validation/regression_loss", valid_rloss, epoch_num)
            writer.add_scalar("validation/total_loss", valid_tloss, epoch_num)
            return valid_tloss
            
        epochs_since_best = 0
        min_valid_loss = ValidCycle(0)
        for epoch_num in range(1, num_epochs + 1):
            train_loss = TrainCycle(epoch_num)
            valid_loss = ValidCycle(epoch_num)

            coco_eval, mean_ap = self.ModelEvaluator.Evaluate()
            writer.add_scalar("test/mAP", mean_ap, epoch_num)

            writer.add_scalar("learning_rate", optimizer.param_groups[0]['lr'], epoch_num)
            
            if isinstance(scheduler, sched.ReduceLROnPlateau):
                scheduler.step(valid_loss)
            else:
                scheduler.step()

            if min_valid_loss > valid_loss:
                epochs_since_best = 0
                min_valid_loss = valid_loss
                torch.save(model, saved_model)
            else:
                epochs_since_best += 1
                if no_best_thd is not None and epochs_since_best >= no_best_thd:
                    break
        
        writer.close()
        model.eval()

    @classmethod
    def _CreateProgBarIter(cls, desc, dataloader):
        bar_format = "{l_bar}{bar}| {n_fmt:>4}/{total_fmt:4} [{elapsed}<{remaining}, {rate_fmt}{postfix}]"
        return tqdm(dataloader, bar_format=bar_format, desc=desc, mininterval=1., unit="batch")

    @classmethod
    def _TrainCycle(cls, tqdm_desc, model, device, dataloader, optimizer):
        model.train()
        
        cstats = Stats(); rstats = Stats(); tstats = Stats()
        tqdm_dataloader = MyDetector._CreateProgBarIter(tqdm_desc, dataloader)
        for iter_num, data in enumerate(tqdm_dataloader):
            optimizer.zero_grad()

            img = data["img"].to(device).float()
            ann = data["annot"].to(device)
            closs, rloss = model([img, ann])

            closs = closs.mean()
            rloss = rloss.mean()
            tloss = closs + rloss

            tloss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), 0.1)

            optimizer.step()

            cstats.add(float(closs))
            rstats.add(float(rloss))
            tstats.add(float(tloss))
            loss_text = f"{{C:{cstats.mean:.3f}, R:{rstats.mean:.3f}, T:{tstats.mean:.3f}}}"
            tqdm_dataloader.set_postfix(losses=loss_text)

            del closs
            del rloss

        tqdm_dataloader.close()
        return cstats.mean, rstats.mean, tstats.mean

    @classmethod
    def _ValidCycle(cls, tqdm_desc, model, device, dataloader):
        # Note: If model is set to evaluation mode, then it returns scores, labels and bboxes.
        #       We want classification and regression loss, so we set it to training mode and
        #       use torch.no_grad() to avoid gradient computation.
        model.train()
        cstats = Stats(); rstats = Stats(); tstats = Stats()
        tqdm_dataloader = MyDetector._CreateProgBarIter(tqdm_desc, dataloader)
        for iter_num, data in enumerate(tqdm_dataloader):
            img = data["img"].to(device).float()
            ann = data["annot"].to(device)
            with torch.no_grad():        
                closs, rloss = model([img, ann])

            closs = closs.mean()
            rloss = rloss.mean()
            tloss = closs + rloss

            cstats.add(float(closs))
            rstats.add(float(rloss))
            tstats.add(float(tloss))
            loss_text = f"{{C:{cstats.mean:.3f}, R:{rstats.mean:.3f}, T:{tstats.mean:.3f}}}"
            tqdm_dataloader.set_postfix(losses=loss_text)

            del closs
            del rloss

        tqdm_dataloader.close()
        return cstats.mean, rstats.mean, tstats.mean

## <font style="color:orange">Training Experiments</font>

Training experiments are conducted via the `conduct_experiment(model_name, exper_name)` function. The `model_name` parameter may be one of the following values. The `exper_name` parameter is the experiment name. The name of the saved model is the experiment name with a .pt extension.

* resnet18
* resnet34
* resnet50
* resnet101
* resnet152

TQDM progress bars are used to display training progress. The following output is the first three epochs of training the resnet18 model.

```
Valid 00: 100%|██████████|   39/39   [00:23<00:00,  1.66batch/s, losses={C:1.034, R:0.979, T:2.012}]
Train 01: 100%|██████████|  531/531  [12:12<00:00,  1.38s/batch, losses={C:0.315, R:0.508, T:0.823}]
Valid 01: 100%|██████████|   39/39   [00:23<00:00,  1.69batch/s, losses={C:0.290, R:0.426, T:0.716}]
CoCoEval: 100%|██████████|  387/387  [01:04<00:00,  5.99image/s, mAP=0.352]
Train 02: 100%|██████████|  531/531  [12:12<00:00,  1.38s/batch, losses={C:0.174, R:0.333, T:0.507}]
Valid 02: 100%|██████████|   39/39   [00:23<00:00,  1.67batch/s, losses={C:0.181, R:0.328, T:0.509}]
CoCoEval: 100%|██████████|  387/387  [01:04<00:00,  5.97image/s, mAP=0.431]
Train 03: 100%|██████████|  531/531  [12:12<00:00,  1.38s/batch, losses={C:0.132, R:0.282, T:0.415}]
Valid 03: 100%|██████████|   39/39   [00:22<00:00,  1.72batch/s, losses={C:0.203, R:0.298, T:0.501}]
CoCoEval: 100%|██████████|  387/387  [01:04<00:00,  5.98image/s, mAP=0.447]
```

The TensorBoard logs may be viewed by clicking this [link]().

In [None]:
def conduct_experiment(model_name, exper_name):
    trainer = MyDetector()
    batch_size = {
        "resnet18": 10,
        "resnet34": 8,
        "resnet50": 6,
        "resnet101": 4,
        "resnet152": 2
    }[model_name]
    
    with io.capture_output() as captured: # surpress output
        trainer.Train_Dataset(
            root_dir = data_dir,          # path to root directory containing coco_dir
            coco_dir = "license_plates",  # name of coco_dir contianing image and annotation folders
            img_dir = "./",               # name of folder containing all training and validation folders
            set_dir = "images_train",     # name of folder contain all training images
            batch_size = batch_size,      # mini batch sampling size for training epochs
            image_size = 512,             # must be 300 or 512
            use_gpu = True,               # train on GPU if true; otherwise, train on CPU
            num_workers = 12              # number of parallel processors for data loader
        )

        trainer.Val_Dataset(
            root_dir = data_dir,          # path to root directory containing coco_dir
            coco_dir = "license_plates",  # name of coco_dir contianing image and annotation folders
            img_dir = "./",               # name of folder containing all training and validation folders
            set_dir = "images_valid"
        )

        trainer.Model(model_name=model_name)   

    optimizer = optim.Adam(trainer.ModelParameters, lr=0.0001)
    scheduler = sched.ReduceLROnPlateau(optimizer, factor=math.sqrt(0.1), patience=3, threshold=0.001)
    trainer.SetOptimizerAndScheduler(optimizer, scheduler)
    trainer.Train(num_epochs=99, exper_name=exper_name)
    
    del trainer
    torch.cuda.empty_cache()


# uncomment the following lines to run an experiment
conduct_experiment("resnet18", "RetinaNet18")
conduct_experiment("resnet34", "RetinaNet34")
conduct_experiment("resnet50", "RetinaNet50")
conduct_experiment("resnet101", "RetinaNet101")
conduct_experiment("resnet152", "RetinaNet152")

#  <font style="color:green">3. Inference [15 Points]</font> 

**You have to make predictions from your trained model on three images from the validation dataset.**

The plotted images should be similar to the following:

<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w10-p1.png'>



<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w10-p2.png'>



<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w10-p3.png'>




## <font style="color:orange">"Refactoring" Monk Inference Class (via Inheritance)</font>

The Monk `Infer` class' Predict method only accepts a path to an image. It also does not annotate images as nicely as my `Annotator`.  Rather than refactor the Monk framework, I derived a new inference class, `MyInferencer`, from Monk's `Infer` class that adds the following features.

* Predictions may be performed on RBG and BGR in-memory images.
* Predictions and result annotation may be performed on validation samples.

In [None]:
class MyInferencer(Infer):
    def __init__(self, image_dir, annotator, verbose=0):
        super().__init__(verbose)
        self.system_dict["dataset"] = {};
        self.system_dict["dataset"]["val"] = {};
        self.__image_dir = image_dir
        self.__annotator = annotator
        self.__model_evaluator = None
        
    @property
    def ModelEvaluator(self):
        if self.__model_evaluator is None:
            self.__model_evaluator = ModelEvaluator(
                self.system_dict["local"]["model"],
                self.system_dict["local"]["dataset_val"]
            )
        return self.__model_evaluator

    def Val_Dataset(self, root_dir, coco_dir, img_dir, set_dir):
        self.system_dict["dataset"]["val"]["status"] = True;
        self.system_dict["dataset"]["val"]["root_dir"] = root_dir;
        self.system_dict["dataset"]["val"]["coco_dir"] = coco_dir;
        self.system_dict["dataset"]["val"]["img_dir"] = img_dir;
        self.system_dict["dataset"]["val"]["set_dir"] = set_dir;  

        self.system_dict["local"]["dataset_val"] = CocoDataset(
            self.system_dict["dataset"]["val"]["root_dir"] + "/" + self.system_dict["dataset"]["val"]["coco_dir"], 
            img_dir = self.system_dict["dataset"]["val"]["img_dir"], 
            set_dir = self.system_dict["dataset"]["val"]["set_dir"],
            transform = transforms.Compose([Normalizer(), Resizer()])
        )

    def PredictViaImageRGB(self, img):
        # normalize image
        image = img.astype(np.float32) / 255.;
        image = (image.astype(np.float32) - self.system_dict["local"]["mean"]) / self.system_dict["local"]["std"];
       
        # compute scaling factor
        rows, cols, cns = image.shape
        smallest_side = min(rows, cols)
        scale = self.system_dict["local"]["min_side"] / smallest_side
        largest_side = max(rows, cols)
        if largest_side * scale > self.system_dict["local"]["max_side"]:
            scale = self.system_dict["local"]["max_side"]  / largest_side

        # resize and pad the image
        image = skimage.transform.resize(image, (int(round(rows*scale)), int(round((cols*scale)))))
        rows, cols, cns = image.shape
        pad_w = 32 - rows%32
        pad_h = 32 - cols%32
        new_image = np.zeros((rows + pad_w, cols + pad_h, cns)).astype(np.float32)
        new_image[:rows, :cols, :] = image.astype(np.float32)

        # convert image to tensor and perform prediction
        img = torch.from_numpy(new_image)
        with torch.no_grad():
            model = self.system_dict["local"]["model"]
            scores, labels, boxes = model(img.cuda().permute(2, 0, 1).float().unsqueeze(dim=0))
            boxes /= scale

        return scores, labels, boxes
        
    def PredictViaImageBGR(self, img):
        return self.PredictViaImageRGB(img[:, :, ::-1])
        
    def PredictViaImageName(self, img_name):
        img_path = os.path.join(self.__image_dir, img_name)
        return self.PredictViaImageRGB(skimage.io.imread(img_path))
    
    def PredictAndAnnotateSample(self, sample, annotator, vis_threshold=0.4):
        img, _ = sample
        img_name = img["file_name"]
        scores, _, bboxes = self.PredictViaImageName(img_name)
        return self.__annotator.annotate_image_name_pred(
            img_name = img_name, 
            bboxes = bboxes, 
            scores = scores, 
            vis_threshold = vis_threshold)

## <font style="color:orange">Detecting and Annotating License Plates in Sample Validation Images</font>

The following cell creates an "inferencer", loads the trained model, and annotates predictions on the same three samples used in Section 1 to plot ground truth bounding boxes. The subsequent three cells perform a prediction on each sample and annotate the detected license plates.

In [None]:
exper_name = "RetinaNet18"
inferencer = MyInferencer(dataset_helper.valid_image_dir, Annotator(dataset_helper.valid_image_dir))
inferencer.Model(model_path=f"{exper_name}.pt")
samples = dataset_helper.get_valid_samples([4, 5, 6])

In [None]:
Annotator.show_image(inferencer.PredictAndAnnotateSample(samples[0], annotator))

In [None]:
Annotator.show_image(inferencer.PredictAndAnnotateSample(samples[1], annotator))

In [None]:
Annotator.show_image(inferencer.PredictAndAnnotateSample(samples[2], annotator))

#  <font style="color:green">4. COCO Detection Evaluation [25 Points]</font> 

**You have to evaluate your detection model on COCO detection evaluation metric.**

For your reference here is the coco evaluation metric chart:


---

<img src="https://www.learnopencv.com/wp-content/uploads/2020/03/c3-w9-coco_metric.png">

---

#### <font style="color:red">The expected `AP` (primary challenge metric) is more than `0.5`.</font>

**The expected output should look similar to the following:**

```
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.550
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.886
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.629
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.256
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.653
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.627
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.504
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.629
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.633
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.380
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.722
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.704
```



## <font style="color:orange">Detection Model Evaluation</font>

The following cell initializes the inferencer's validation dataset and requests its `ModelEvaluator`. It then calls the model evaluator's `Evaluate` method and summarizes the resulting CoCo evaluation.

In [None]:
with io.capture_output() as captured: # surpress output
    inferencer.Val_Dataset(
        root_dir = data_dir,          # path to root directory containing coco_dir
        coco_dir = "license_plates",  # name of coco_dir contianing image and annotation folders
        img_dir = "./",               # name of folder containing all training and validation folders
        set_dir = "images_valid"
    )

evaluator = inferencer.ModelEvaluator
coco_eval, _ = evaluator.Evaluate()   
coco_eval.summarize()

# <font style="color:green">5. Run Inference on a Video [15 Points]</font>

#### [Download the Input Video](https://www.dropbox.com/s/g88o4dx18zpgn8k/projet3-input-video.mp4?dl=1)

**You have to run inference on a video.** 

You can download the video from [here](https://www.dropbox.com/s/g88o4dx18zpgn8k/projet3-input-video.mp4?dl=1).

#### <font style="color:red">Upload the output video on youtube and share the link. Do not upload the video in the lab.</font>

In [None]:
from IPython.display import YouTubeVideo, display
video = YouTubeVideo("18HWHCevFdU", width=640, height=360)
display(video)

**Your output video should have a bounding box around the vehicle registration plate.**

In [None]:
video = YouTubeVideo("5SgCuee7AMs", width=640, height=360)
display(video)

## <font style="color:orange">Detection and Annotating a Video</font>

I modified the sample function to read the source video frame-by-frame, detect and annotate license plates in each frame, and create an output video. I uploaded this video to YouTube and it may be viewed in this notebook's last cell.

In [None]:
def annotate_video(src_video_path, dst_video_path, frames_dir=None):
    """
    Create a new video by annotating license plates in the source video.
    
    src_video_path (str): path of source video in which to annotate
    dst_video_path (str): path to destination video in which to create
    frames_dir optional(str): if specified, direction to write annotated frames
    """
    
    # create a video reader
    reader = cv2.VideoCapture(src_video_path)
    if not reader.isOpened(): 
        print("Error opening video file")
        return
        
    # get source video attributes
    width = int(reader.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(reader.get(cv2.CAP_PROP_FRAME_HEIGHT))
    frames_per_second = float(reader.get(cv2.CAP_PROP_FPS))
    
    # create a video writer
    writer = cv2.VideoWriter(
        filename = dst_video_path,
        fourcc = cv2.VideoWriter_fourcc(*"mp4v"),
        fps =  frames_per_second,
        frameSize = (width, height),
        isColor=True,
    )
    
    # for each frame in the source video ...
    count = 0
    while reader.isOpened():
        # read the frame
        ret, frame = reader.read()
        if ret:
            # detect and annotate license plates
            scores, _, bboxes = inferencer.PredictViaImageBGR(frame)
            frame = annotator.annotate_image_bgr_pred(frame, bboxes, scores, color=(0, 140, 255))
            # write frame to destination video
            writer.write(frame)
            # if specified, write frame to frames directory
            if frames_dir is not None:
                name = f"frame{count:04d}.jpg"
                path = os.path.join(frames_dir, name)
                cv2.imwrite(path, frame)
            count += 1
        else:
            break
    
    # free reader and writer resources
    reader.release()
    writer.release()
    
    return

In [None]:
# ! wget -O project3-input-video.mp4 https://www.dropbox.com/s/g88o4dx18zpgn8k/projet3-input-video.mp4?dl=1
annotate_video("project3-input-video.mp4", "project3-output-video.mp4", "./Frames")

In [None]:
#video = YouTubeVideo("", width=640, height=360)
#display(video)

## <font style="color:orange">Conclusion</font>

The RetinaNet model with the Feature Pyramid Network backbone on top of a feedforward ResNet-## architecture achieved a mAP score of 0.5#.

The utility of the [Monk](https://github.com/Tessellate-Imaging/Monk_Object_Detection) framework was questionable. Several blog posts advertise that they build a custom object detector in 5 lines of code. Technically, this is possible. However, one is at the mercy of the framework creators. However, for this project, I rewrote most of the training, validation, testing, and visualization code. Fortunately, I did not have to modify the RetinaNet implementation. Monk has potential, but it needs significant refactoring to allow basic customization, e.g., specifying an optimizer and scheduler, computing the average loss on the validation set to ascertain whether the model is overfitting, etc.