# 7144COMP/CW2: Bird Multiple Object Detection Using Faster R-CNN 
## PART 2.Training
### Overview
In this notebook, I will train an object detection model using the pre-processed data from the previous notebook. 

- Download the object detection models from Tensorflow 2 Detection Model Zoo >> [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md).
- The model's hyperparameters and configuration are set in the ```fasterrcnn_config.config``` file. 
- The model is trained through this notebook using ```model_main_tf2.py``` with the relevent arguments.


#### Prerequisites
- Environment Setup (see Part 0)
- Data preprocessing (see Part 1)

## 1. Download the model from TensorFlow 2 Detection Model Zoo 
#### Import the necessary packages

In [77]:
import os
import re #<- regular expressions
import tensorflow as tf

#### Setup

In [78]:
# Define constants
RANDOM_SEED = 99 #<-ensure the reproduciblity of the training results
BATCH_SIZE = 1
NUM_STEPS = 22000 # <- for 6 epochs (and 400 steps to generate the last ckpt for eval) 
NUM_EVAL_STEPS = 1000 #<- execute evaluation each 1000 steps

# Current directory
current_dir = os.getcwd()

#### Download Fine-tuned ```Faster R-CNN ResNet101``` from Tensorflow 2 Detection Model Zoo 

**Why Faster R-CNN**?

Faster R-CNN is an object detection model that improves on Fast R-CNN by utilising a region proposal network (RPN) with the CNN model.

Faster R-CNN has impressive detection effects in ordinary scenes ([source](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7582940/)).

However, under certain conditions, there can still be unsatisfactory detection performance, such as: the object having problems like occlusion, deformation, or small size ([source](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7582940/)).

Our project deals with ordinary scenes, according to the requirements, we should prioritise accuracy over speed, therefore, two-step object detectors like Faster R-CNN may be the most suitable for this task given the limitations in terms of time and computing power.

In [56]:
# Download Faster R-CNN ResNet101 if it doesn't exist locally
if not os.path.isdir('faster_rcnn_resnet101_v1_640x640_coco17_tpu-8'):
    !wget http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet101_v1_640x640_coco17_tpu-8.tar.gz
    # Unzip and remove compressed files
    !tar -xf faster_rcnn_resnet101_v1_640x640_coco17_tpu-8.tar.gz
    # Cleanup
    !rm faster_rcnn_resnet101_v1_640x640_coco17_tpu-8.tar.gz

#### Load Train, Test, Valid TFRecords, labelmap

In [79]:
# Train, Test, Valid TFRecord files
train_record_path = os.path.join(current_dir, 'Birds', 'train', 'birds.tfrecord')
test_record_path = os.path.join(current_dir, 'Birds', 'test', 'birds.tfrecord')
valid_record_path = os.path.join(current_dir, 'Birds', 'valid', 'birds.tfrecord')

# Labelmap
labelmap_path = os.path.join(current_dir, 'Birds', 'train', 'birds_label_map.pbtxt')

# 2. Model's Config files, Checkpoints and Hyperparameters

In [58]:
# Load the latest Checkpoint if it exists
fine_tune_checkpoint_fasterrcnn = 'faster_rcnn_resnet101_v1_640x640_coco17_tpu-8/checkpoint/ckpt-0'
print('Checkpoint Dir:', fine_tune_checkpoint_fasterrcnn)

Checkpoint Dir: faster_rcnn_resnet101_v1_640x640_coco17_tpu-8/checkpoint/ckpt-0


In [59]:
# config files can be edited and updated on ayoubbensakhria/TensorFlowOD repository
if os.path.isfile('pipeline.config'):
    !rm 'pipeline.config'

# Download the latest base pipeline config file
!wget https://raw.githubusercontent.com/ayoubbensakhria/TensorFlowOD/master/7144COMP/training/pipeline.config

# data_augmentation_options section has been removed because it has been done by Roboflow
base_config_path_fasterrcnn = 'pipeline.config'

--2023-01-06 18:27:19--  https://raw.githubusercontent.com/ayoubbensakhria/TensorFlowOD/master/7144COMP/training/pipeline.config
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3834 (3.7K) [text/plain]
Saving to: ‘pipeline.config’


2023-01-06 18:27:20 (108 MB/s) - ‘pipeline.config’ saved [3834/3834]



#### Hyperparameters

Our approach consists of setting a maximum number of epochs (10) and the validation loss as a metric to monitor. The training process will continue until the validation loss has not improved for a certain number of epochs, at which point the training process is stopped.

After training at 10 epochs, we observed that validation loss didn't improve after the 6th epoch.

```num_epochs=6``` = one forward pass and one backward pass of all the training examples. One step takes on average 2 seconds, an epoch consists of 3600 steps (batch_size=1). 

However, for most real-world datasets, it may not be sufficient to train a model to good performance, as the model will not have the opportunity to learn from the entire dataset.

```batch_size=1``` = the number of training examples in one forward/backward pass (1 step). The higher the batch_size, the more memory space we would need. Here the available memory allows a max of batch_size = 1 

```num_steps=21600```: number of iterations, or a single update of the model weights.

```fixed_shape_resizer```: a fixed resolution of ```640x640 px``` is useful for ensuring that all input images have the same size, which can make them easier to process and may improve the performance of the model.

```grid_anchor_generator```: anchor boxes are used to identify potential object locations within the image. The performance of a Faster R-CNN model can be affected by the parameters of the grid anchor generator such as ```scales```, ```aspect_ratios```, ```height_stride```, ```width_stride```, and it may be necessary to experiment with different values to find the best performing configuration.

```second_stage_post_processing```: is responsible for taking the output of the model's second stage (the region proposal network) and generating the final set of object detections. The specific parameters used can have a significant impact on the model's performance.

 
The hyperparameters above may be suitable for quickly testing the performance of a model on our dataset, but they may not be optimal for training a model to good performance on a real-world dataset.

In [60]:
# Config the Model Pipeline Edition function
def edit_config(model_name, base_config_path, fine_tune_checkpoint):
  with open(base_config_path) as f:
    config = f.read()

  with open('{model}_config.config'.format(model=model_name), 'w') as f:

    # Set labelmap path
    config = re.sub('label_map_path: ".*?"', 
              'label_map_path: "{}"'.format(labelmap_path), config)
    
    # Set fine_tune_checkpoint path
    config = re.sub('fine_tune_checkpoint: ".*?"',
                    'fine_tune_checkpoint: "{}"'.format(fine_tune_checkpoint), config)

    # Set train tf-record file path
    config = re.sub('(input_path: ".*?)(PATH_TO_BE_CONFIGURED/train)(.*?")', 
                    'input_path: "{}"'.format(train_record_path), config)

    # Set test tf-record file path
    config = re.sub('(input_path: ".*?)(PATH_TO_BE_CONFIGURED/val)(.*?")', 
                    'input_path: "{}"'.format(test_record_path), config)

    # Set number of classes.
    config = re.sub('num_classes: [0-9]+',
                    'num_classes: {}'.format(4), config)

    # Set batch size
    config = re.sub('batch_size: [0-9]+',
                    'batch_size: {}'.format(BATCH_SIZE), config)

    # Set training steps
    config = re.sub('num_steps: [0-9]+',
                    'num_steps: {}'.format(NUM_STEPS), config)

    # Set fine-tune checkpoint type to detection
    config = re.sub('fine_tune_checkpoint_type: "classification"', 
              'fine_tune_checkpoint_type: "{}"'.format('detection'), config)

    f.write(config)

In [61]:
# Edit config Fatser R-CNN
edit_config('fasterrcnn', base_config_path_fasterrcnn, fine_tune_checkpoint_fasterrcnn)

# Clean up
!rm 'pipeline.config'

# Print config pipeline
%cat 'fasterrcnn_config.config'

# Faster R-CNN with Resnet-101 (v1)
# Trained on COCO, initialized from Imagenet classification checkpoint

# This config is TPU compatible.

model {
  faster_rcnn {
    num_classes: 4
    image_resizer {
      fixed_shape_resizer {
        width: 640
        height: 640
      }
    }
    feature_extractor {
      type: 'faster_rcnn_resnet101_keras'
      batch_norm_trainable: true
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_los

# 3. Train Faster R-CNN ResNet101 Object Detector

- The validation script must be run concurrently with the training script in order to visualise the validation loss curve on ```TensorBoard```. 
- The validation script should be listening to new checkpoints (output: ```Waiting for new checkpoint at``` ...) to execute valiadation each ```1000 steps``` (see part 3). 

In [80]:
# Model training directory and config pipeline
model_dir = os.path.join(current_dir, 'training')
pipeline_config_path = 'fasterrcnn_config.config'

# Test training params
print (pipeline_config_path, model_dir, NUM_STEPS)

fasterrcnn_config.config /home/msc1/Desktop/7144COMP/Models/faster_rcnn_resnet101/training 22000


In [None]:
# Execute training
!python $current_dir/models/research/object_detection/model_main_tf2.py \
    --pipeline_config_path=$pipeline_config_path \
    --model_dir=$model_dir \
    --alsologtostderr \
    --num_train_steps=$NUM_STEPS 

2023-01-08 14:34:01.975553: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-08 14:34:02.728050: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2023-01-08 14:34:02.728097: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2023-01-08 14:34:04.109195: E tensorflow/compiler/xla/stream_executor/cuda/c

The script above was used to train an object detection model using ```TensorFlow 2```. It takes in a pipeline configuration file, which specifies the model and training configuration, and a set of training and evaluation data.

The script has several flags that can be used to control the training process. 

- ```--pipeline_config_path```  specifies the path to the pipeline configuration file, which defines the model architecture and training parameters. 

- ```--model_dir```  specifies the directory where the trained model and training logs should be saved. The --alsologtostderr flag causes the training logs to be written to both the log file and the console.

- ```--num_train_steps```  specifies the number of training steps to run.


### Export our OD inference graph

Graphs are data structures that contain a set of ```tf.Operation``` objects, which represent units of computation; and ```tf.Tensor``` objects, which represent the units of data that flow between operations. 

Here we will save our object detection inference graph files in ```fasterrcnn_inference_graph/saved_model```.

The following script uses the ```exporter_main_v2.py``` script from the TensorFlow object detection library to export the trained model. The script loads the trained model from the specified checkpoint directory and then uses the pipeline configuration file to create a new model (a copy of the trained model) with the same architecture. The exported model is saved in the specified output directory.

This new model is a copy of the trained model, but it has been converted to a format that is suitable for serving or for further training.

In [76]:
# Define the output directory
output_directory = 'fasterrcnn_inference_graph'

# Export OD inference graph
!python $current_dir/models/research/object_detection/exporter_main_v2.py \
    --trained_checkpoint_dir $model_dir \
    --output_directory $output_directory \
    --pipeline_config_path $pipeline_config_path

2023-01-08 14:15:23.888782: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-08 14:15:24.638287: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2023-01-08 14:15:24.638336: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2023-01-08 14:15:26.156981: E tensorflow/compiler/xla/stream_executor/cuda/c

- ``` trained_checkpoint_dir``` : Directory containing the trained model checkpoints.
- ``` output_directory``` : Directory where the exported model will be saved.
- ``` pipeline_config_path``` : Path to the pipeline configuration file, which specifies the model architecture and other options.

## Next Steps
- Evaluate the trained model using TensorBoard.