# 7144COMP/CW2: Bird Multiple Object Detection Using Faster R-CNN ResNet101 Network 
## PART IV: Model evaluation and deployment

### Overview

In this notebook, I will evaluate my model through TensorBoard while using the generated metrics to determine model convergence (both validation loss and Intersection over Union (IoU) at both 0.5 and 0.75 are considered). 

The number of epochs to train the model is set to 1, the reason for this choice was explained in the training notebook. In addition, during the 1st epoch of training, the model converged around the final loss value (smoothed loss value with a weight of 0.8).



For the current task, the following steps have been undertaken: 

- Launch TensorBoard displaying both the train and evaluation metrics for the given session. 
- Provide justification for the number of epochs used for training your object detection model

### Next

In the next notebook which is an extension to the present, I will:

- Freeze my trained model in correct format for model inferencing
- Develop a Jupyter Notebook to perform inference on the frozen model using unseen test images
- Discuss my results.

### Prerequisites
This notebook runs locally on the environment *tf-gpu*.
- Environment Setup (see Part 0)
- Preprocessing (see Part 1)
- Training (see Part 2)
- Run the necessary evaluation scripts (see Part 3)

## 1. Import the necessary packages

In [2]:
import os

In [3]:
# Current directory
current_dir = os.getcwd()
# Model training directory and config pipeline
model_dir = os.path.join(current_dir, 'training')
pipeline_config_path = 'fasterrcnn_config.config'

## 2. TensorBoard 
### 2.1. Monitor region proposal losses in real-time
Here ```logdir``` points to the train directory, by launching the next cell, different loss graphs for region proposal network will be imported by TensorBoard and updated each step.

The losses for the Region Proposal Network:

- ```Loss/RPNLoss/localization_loss```: Localization Loss or the Loss of the Bounding Box regressor for the RPN

- ```Loss/RPNLoss/objectness_loss```: Loss of the Classifier that classifies if a bounding box is an object of interest or background

The losses for the Final Classifier:

- ```Loss/BoxClassifierLoss/classification_loss```: Loss for the classification of detected objects into various classes: Cat, Dog, Airplane etc

- ```BoxClassifierLoss/localization_loss```: Localization Loss or the Loss of the Bounding Box regressor



In [6]:
!tensorboard --logdir $current_dir'/training/train'

2022-12-23 00:52:43.287233: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-23 00:52:43.934477: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-12-23 00:52:43.934530: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-12-23 00:52:44.720641: E tensorflow/compiler/xla/stream_executor/cuda/c

### 2.2. Display the train and evaluation metrics for the given session 

In [4]:
# This requires stopping the previous TensorBoard Server
# Execute tensorboard and point logdir to the eval folder to load
# DetectionBoxes Precision and Recall Metrics at STEP 28000
!tensorboard --logdir $current_dir'/training/eval'

2022-12-24 18:02:47.277700: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-24 18:02:47.924591: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-12-24 18:02:47.924640: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-12-24 18:02:48.728956: E tensorflow/compiler/xla/stream_executor/cuda/c

## 3. Discussion


- Our mean average Precision scores at 0.5 and 0.75 levels of IoU were ```mAP@.5 = 0.8711``` and ```mAP@.75 = 0.6416``` respectively, which are above mAP scores of the original Faster R-CNN ResNet101 V1 640x640 (trained on MS COCO Dataset). Our model improved its precision for object detection on our custom database thanks to transfer learning.
- Our average Recall (sensitivity) score was ```AR = 0.6083```.

Idealy the better mAP and AR the better our model's performance.

### Justification for the number of epochs used for training your object detection model
During the training, one step took on average 1.95 seconds, an epoch consists of 28000 steps (batch_size=1), so the total duration of an epoch would be approx 15 hours. 

It was not possible to increase the batch size due to memory limitations. 

The experiment conducted on the Cloud showed that with 40 epochs the model demonstrated the same level of accuracy and noise compared to 1 training epoch on the local machine. Training the model with higher batch_size led to faster results (43 minutes).

However, this is not quite sufficient to obtain an optimal level of inference precision and the lowest noise possible.

#### **Comparison with Roboflow Cloud-based experiment (using the same model and dataset)**
The same model with almost the same pipeline was trained on Roboflow Cloud (detailed metrics can be found at the end of this notebook): 

- **Without data augmentation**: 300 epochs were needed to converge, the model achieved 95.6% mAP 90.5% precision and 91.2% recall with an average class precision of 95% on the validation dataset.

<img src="https://storage.googleapis.com/roboflow-platform-cache/RjBpFWbVLQdI2NaOrqg24Eooatr2/qYHiTyjFVuJ6MWIK56Sh/2/results.png" width="800" />

- **With data augmentation**: using almost the same augmentation steps and the same hyperparameters (except num_epochs) : 40 epochs were needed to converge, the model achieved **90.1% mAP 88.4% precision and 82.8% recall** with an **average class precision of 90%** on the validation dataset.

- We can conclude that data augmentation was necessary to reduce the number of epochs and mitigate the risk of over-fitting.

<img src="https://storage.googleapis.com/roboflow-platform-cache/RjBpFWbVLQdI2NaOrqg24Eooatr2/qYHiTyjFVuJ6MWIK56Sh/4/results.png" width="800" />

### Next

- Freeze the trained model in correct format for model inferencing
- Develop a Jupyter Notebook to perform inference on the frozen model using unseen test images
- Discuss my results.