# 7144COMP/CW2: Bird Multiple Object Detection Using Faster R-CNN 
## PART 2.Training
### Overview
In this notebook, I will train an object detection model using the pre-processed data from the previous notebook. 

- Download the object detection models from Tensorflow 2 Detection Model Zoo >> [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md).
- The model's hyperparameters and configuration are set in the ```fasterrcnn_config.config``` file. 
- The model is trained through this notebook using ```model_main_tf2.py``` with the relevent arguments.


#### Prerequisites
- Environment Setup (see Part 0)
- Data preprocessing (see Part 1)

## 1. Download the model from TensorFlow 2 Detection Model Zoo 
#### Import the necessary packages

In [3]:
import os
import re #<- regular expressions
import tensorflow as tf
physical_devices = tf.config.list_physical_devices()
print("Num GPUs Available: ", len(physical_devices))

Num GPUs Available:  1


#### Setup

In [4]:
# Define constants
# RANDOM_SEED ensure the reproduciblity of training results
RANDOM_SEED = 99
# We have 4,000 images and use a batch size of 1 
# an epoch consists of:
#36,00 images / (1 images / step) = 3600 steps.
BATCH_SIZE = 1
NUM_STEPS = 28000 
NUM_EVAL_STEPS = 1000

EPOCHS = 1
# Current directory
current_dir = os.getcwd()

#### Download Fine-tuned Faster R-CNN ResNet101 model

***Why Faster R-CNN***?

Faster R-CNN is an object detection model that improves on Fast R-CNN by utilising a region proposal network (RPN) with the CNN model.

Faster R-CNN has impressive detection effects in ordinary scenes ([source](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7582940/)).

However, under certain conditions, there can still be unsatisfactory detection performance, such as: the object having problems like occlusion, deformation, or small size ([source](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7582940/)).

Our project deals with ordinary scenes, depending on the requirements, we should prioritise accuracy over speed, therefore, two-step object detectors like Faster R-CNN may be the most suitable for this task given the limitations in terms of time and computing power.

In [27]:
# Download Faster R-CNN ResNet101 if it doesn't exist locally
if not os.path.isdir('faster_rcnn_resnet101_v1_640x640_coco17_tpu-8'):
    !wget http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet101_v1_640x640_coco17_tpu-8.tar.gz
    # Decompression and remove compressed files
    !tar -xf faster_rcnn_resnet101_v1_640x640_coco17_tpu-8.tar.gz
    # Cleanup
    !rm faster_rcnn_resnet101_v1_640x640_coco17_tpu-8.tar.gz

#### Load Train, Test, Valid TFRecords

In [26]:
# Train, Test, Valid TFRecord files
train_record_path = os.path.join(current_dir, 'Birds', 'train', 'birds.tfrecord')
test_record_path = os.path.join(current_dir, 'Birds', 'test', 'birds.tfrecord')
valid_record_path = os.path.join(current_dir, 'Birds', 'valid', 'birds.tfrecord')
# Labelmap
labelmap_path = os.path.join(current_dir, 'Birds', 'train', 'birds_label_map.pbtxt')

# 2. Model's Config files, Checkpoints and Hyperparameters

In [28]:
# Load the latest Checkpoint if it exists
# Default
fine_tune_checkpoint_fasterrcnn = 'faster_rcnn_resnet101_v1_640x640_coco17_tpu-8/checkpoint/ckpt-0'
#if os.path.isdir('training'):
    #fine_tune_checkpoint_fasterrcnn = os.path.join(current_dir, 'training')
print('Checkpoint Dir:', fine_tune_checkpoint_fasterrcnn)

Checkpoint Dir: faster_rcnn_resnet101_v1_640x640_coco17_tpu-8/checkpoint/ckpt-0


In [29]:
# config files can be edited and updated on ayoubbensakhria/TensorFlowOD repository
if os.path.isfile('pipeline.config'):
    !rm 'pipeline.config'
# Download the latest base pipeline config file
!wget https://raw.githubusercontent.com/ayoubbensakhria/TensorFlowOD/master/7144COMP/training/pipeline.config

# data_augmentation_options section has been removed because it has been done by Roboflow
base_config_path_fasterrcnn = 'pipeline.config'

--2022-12-21 22:33:01--  https://raw.githubusercontent.com/ayoubbensakhria/TensorFlowOD/master/7144COMP/training/pipeline.config
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3834 (3.7K) [text/plain]
Saving to: ‘pipeline.config’


2022-12-21 22:33:01 (115 MB/s) - ‘pipeline.config’ saved [3834/3834]



#### Hyperparameters

*num_epoch* = one forward pass and one backward pass of all the training examples. One step takes on average 2 seconds, an epoch consists of 28000 steps (batch_size=1), so the total duration of an epoch is approx 15 hours. 

*batch_size* = the number of training examples in one forward/backward pass (1 step). The higher the batch_size, the more memory space we would need. Here the available memory allows a max of batch_size = 1 

*num_steps*: number of iterations = number of passes, each pass using batch size number of 1 

Taking into consideration all the factors above, the following hyperparameters were set as follows:

- ```batch_size: 1 ```
- ```num_epochs: 1```
- ```num_steps: 28000```
- ```fixed_shape_resizer```: 640 px for height and width (resize input images) (refer to the preprocessing section for justification)

In [30]:
# Config Edition function
def edit_config(model_name, base_config_path, fine_tune_checkpoint):
  with open(base_config_path) as f:
    config = f.read()

  with open('{model}_config.config'.format(model=model_name), 'w') as f:

    # Set labelmap path
    config = re.sub('label_map_path: ".*£?"', 
              'label_map_path: "{}"'.format(labelmap_path), config)
    
    # Set min dimension
    config = re.sub('width: ".*?"', 
              'width: "{}"'.format(MIN_DIMENSION), config)
    
    # Set max dimension
    config = re.sub('height: ".*?"', 
              'height: "{}"'.format(MAX_DIMENSION), config)
    
    # Set fine_tune_checkpoint path
    config = re.sub('fine_tune_checkpoint: ".*?"',
                    'fine_tune_checkpoint: "{}"'.format(fine_tune_checkpoint), config)

    # Set train tf-record file path
    config = re.sub('(input_path: ".*?)(PATH_TO_BE_CONFIGURED/train)(.*?")', 
                    'input_path: "{}"'.format(train_record_path), config)

    # Set test tf-record file path
    config = re.sub('(input_path: ".*?)(PATH_TO_BE_CONFIGURED/val)(.*?")', 
                    'input_path: "{}"'.format(test_record_path), config)

    # Set number of classes.
    config = re.sub('num_classes: [0-9]+',
                    'num_classes: {}'.format(4), config)

    # Set batch size
    config = re.sub('batch_size: [0-9]+',
                    'batch_size: {}'.format(BATCH_SIZE), config)

    # Set training steps
    config = re.sub('num_steps: [0-9]+',
                    'num_steps: {}'.format(NUM_STEPS), config)

    # Set fine-tune checkpoint type to detection
    config = re.sub('fine_tune_checkpoint_type: "classification"', 
              'fine_tune_checkpoint_type: "{}"'.format('detection'), config)

    f.write(config)

In [31]:
# Edit config Fatser R-CNN
edit_config('fasterrcnn', base_config_path_fasterrcnn, fine_tune_checkpoint_fasterrcnn)
# Clean up
!rm 'pipeline.config'
# Print config pipeline
%cat 'fasterrcnn_config.config'

# Faster R-CNN with Resnet-101 (v1)
# Trained on COCO, initialized from Imagenet classification checkpoint

# This config is TPU compatible.

model {
  faster_rcnn {
    num_classes: 4
    image_resizer {
      fixed_shape_resizer {
        width: 640
        height: 640
      }
    }
    feature_extractor {
      type: 'faster_rcnn_resnet101_keras'
      batch_norm_trainable: true
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_los

# 3. Train Faster R-CNN ResNet101 Object Detector

In [32]:
# Model training directory and config pipeline
model_dir = os.path.join(current_dir, 'training')
pipeline_config_path = 'fasterrcnn_config.config'
# Test training params
print (pipeline_config_path, model_dir, NUM_STEPS)

fasterrcnn_config.config /home/msc1/Desktop/7144COMP/Models/faster_rcnn_resnet101/training 28000


In [33]:
# Execute training
!python $current_dir/models/research/object_detection/model_main_tf2.py \
    --pipeline_config_path=$pipeline_config_path \
    --model_dir=$model_dir \
    --alsologtostderr \
    --num_train_steps=$NUM_STEPS \
    --num_eval_steps=$NUM_EVAL_STEPS

2022-12-21 22:33:18.514153: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-21 22:33:19.284785: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-12-21 22:33:19.284833: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-12-21 22:33:20.797124: E tensorflow/compiler/xla/stream_executor/cuda/c

### Export our OD inference graph

- Graphs are data structures that contain a set of tf.Operation objects, which represent units of computation; and tf.Tensor objects, which represent the units of data that flow between operations. 

- Graphs are defined in a tf.Graph context. Since these graphs are data structures, they can be saved, run, and restored all without the original Python code.

- Here we will save our OD inference graph files in **fasterrcnn_inference_graph/saved_model** directory

In [34]:
output_directory = 'fasterrcnn_inference_graph'

!python $current_dir/models/research/object_detection/exporter_main_v2.py \
    --trained_checkpoint_dir $model_dir \
    --output_directory $output_directory \
    --pipeline_config_path $pipeline_config_path

2022-12-22 13:32:16.408113: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-22 13:32:17.489770: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-12-22 13:32:17.489831: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2022-12-22 13:32:19.437146: E tensorflow/compiler/xla/stream_executor/cuda/c

#### saved_model.pb file (fasterrcnn_inference_graph/saved_model.saved_model.pb)
.pb stands for Protocol Buffers, it is a language-neutral, platform-neutral extensible mechanism for serializing structured data (Our TF Graph). It is widely used in model deployment, such as fast inference tool TensorRT.

This file is loaded along with its dependencies by TensorFlow using tf.saved_model.load to make inferences (see deployment part 4).