# Model Training
Model training is handled by scripts provided by the Tensorflow object detection API, thus we do not have any training code in a notebook. The process, however, will be explained here.

### General Links
- [TensorFlow object detection API (git)](https://github.com/tensorflow/models/tree/master/research/object_detection)
- [TensorFlow object detection API (unoffical tutorial)](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/index.html)
- [TensorFlow object detection model zoo (COCO pretrained)](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md)

First, some variables and a utility function are defined which make it easier to execute the training script with the correct parameters. (This neat little trick is inspired by Christophs and Linus' work)

In [1]:
PATH_TO_MODEL_MAIN = '/home/dammeier@ab.ba.ba-ravensburg.de/dev/oil-storage-detection/scripts/model_main_tf2.py'
MODEL_DIR = '/home/dammeier@ab.ba.ba-ravensburg.de/dev/oil-storage-detection/data/06_models/models/ssd_resnet101_1024/v2/'
CONFIG_PATH = '/home/dammeier@ab.ba.ba-ravensburg.de/dev/oil-storage-detection/data/06_models/models/ssd_resnet101_1024/v2/pipeline.config'

2022-05-28 12:10:27.299606: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


In [3]:
def create_training_query():
    return f"python {PATH_TO_MODEL_MAIN} --model_dir={MODEL_DIR} --pipeline_config_path={CONFIG_PATH}"

In [4]:
create_training_query()

'python /home/dammeier@ab.ba.ba-ravensburg.de/dev/oil-storage-detection/scripts/model_main_tf2.py --model_dir=/home/dammeier@ab.ba.ba-ravensburg.de/dev/oil-storage-detection/data/06_models/models/ssd_resnet101_1024/v2/ --pipeline_config_path=/home/dammeier@ab.ba.ba-ravensburg.de/dev/oil-storage-detection/data/06_models/models/ssd_resnet101_1024/v2/pipeline.config'

Next, a more detailed look at the training process will be provided. 
## 2D Bounding Box Object Detection
The task on hand is to detect objects in color images by drawing a rectangular bounding box around them. It is usually devided into two subtasks, that are individually adressed during model desing:
1. Bounding box regression, the task of identifying regions whithin the image that are likely to contain objects (localization)
2. Classification of those regions of iterest (classification)
As our problem is single class, we only have to detect whether a given region contains an object or not. 
### Models
A multitude of models exist for bounding box object detection, some famous ones being
- Faster R-CNN
- SSD (Single-Stage-Detector)
- EfficientDet
- YOLO (v1, v2, v3, v4, v5, X)

The most performant "state of the art" are the newer YOLO models and EfficientDet. All of the above are based on CNNs, recently however, transformer-based approaches have been published to literature as well.

The largest difference in model architecture is usually whether they use a single-stage or multi-stage design. Single stage detectors combine the localization and classification parst into a single network component, while two stage detectors use seperate localization and classifcation components.

Object detection models usually use the same feature extraction networks as their backbones as image-classifies do. These backbones are usually pretrained on ImageNet, a large-scale classification dataset. Object detectors themselve are usually pretrained and evaluated on the COCO-Dataset using the COCO mAP metric. 

The TensorFlow object detection API offers a collection of object detection models pretrained on COCO which we will use in this project.
### Evaluation of 2D Object Detectors
After an object detector has been trained, its predictions are evaluated on a test or validation set. The predictions consist of the class, localization and confidence. These values are compared to the provided ground truth. In the case of object detection, there can be multiple instances of multiple classes within a single image. Thus, a simple measurement for classification accuracy is not appropriate (Everingham et al, 2010). Instead, both the localization and classification of objects have to be evaluated. The localization is only considered if the prediction confidence for a proposed bounding box exceeds a defined threshold. If that is the case, the localization error is determined by the calculation of the intersection of union (IoU). The IoU is defined as the overlapping of the prediction $B_p$ and ground truth $B_{gt}$ bounding boxes. With $area(B_p \cap B_{gt})$ being the area of the intersection and $area(B_p \cup B_{gt})$ the area of the union, the IoU is calculated as
$$IoU = \frac{area(B_p \cap B_{gt})}{area(B_p \cup B_{gt})}$$
(Everingham et al, 2010).
With the measurement of class confidence and IoU, a proposed object can be classified into three states. The detection is a True Positive (TP) if the confidence and IoU exceed a defined threshold and thus the prediction is defined as correct. If the IoU is too low or zero or the predicted class is wrong, this is called a False Positive (FP). False negative (FN) detections are objects that are not detected at all or do not exceed the defined confidence threshold. Using these metrics, the precision and recall can be defined as follows:
$$Precision = \frac{TP}{TP+FP},$$
$$Recall = \frac{TP}{TP+FN}.$$
The precision can be interpreted as the ratio of true detections over all detections while the recall measures how many objects are detected of all given objects in the ground truth. 

Since the introduction of the PASCAL VOC (Everingham et al, 2010), the mean average precision (mAP) has been the de facto standard for measuring the performance of object detectors (Zou et al., 2019). The original average precision (AP) as defined by Everingham et al. (2010) is calculated as the mean precision over 11 different recall values:
$$AP = \frac{1}{11} \sum_{r \in \{0,0.1,...,1\}} p_{interp}(r)$$
with $r$ being the recall value and $p_{interp}$ being the interpolated precision at $r$. The AP is then averaged over all available classes N to obtain the mAP:
$$mAP = \frac{1}{N} \sum_{n \in N} AP_n.$$
Everingham et al. (2010) used a threshold of 0.5 for the IoU for their definition of the mAP which was later extended by the MS COCO Benchmark to the average of ten IoU thresholds (Lin et al, 2014):
$$COCO\;mAP = \frac{1}{10} \sum_{t \in \{0.5,0.55,...,0.95\}} mAP_t.$$
The COCO mAP has the advantage of rewarding detectors with better bounding box accuracy.

As the problem on hand is a single class problem, we will not be needing the average over mutliple classes. However, since APIs for the COCO metrics are easily integrated into the TensorFlow object detection API, we will be using them.

Sources:
- Everingham, M., van Gool, L., Williams, C. K. I., Winn, J. and Zisserman, A. (2010),
‘The pascal visual object classes (voc) challenge’, International Journal of Computer
Vision 88(2), 303–338.
- Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll ́ar, P.
and Zitnick, C. L. (2014), Microsoft coco: Common objects in context, in D. Fleet,
T. Pajdla, B. Schiele and T. Tuytelaars, eds, ‘Computer Vision – ECCV 2014’,
Springer International Publishing, Cham, pp. 740–755.
- Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll ́ar, P. and
Zitnick, C. L. (n.d.a), ‘Evaluations of detections on the ms coco dataset’.
URL: https://cocodataset.org/detection-eval
- Zou, Z., Shi, Z., Guo, Y. and Ye, J. (2019), ‘Object detection in 20 years: A survey’.
URL: http://arxiv.org/pdf/1905.05055v2
## Model Training (TensorFlow OD API)
The TensorFlow object detection API follows four basic steps for model training:
### 1. Model Definition or Download
First, a model has to either be defined from scratch (out of scope of this project), or downloaded. For this project, pretrained model checkpoints for popular OD-models are downloaded from the [TensorFlow object detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md)
### 2. Training Configuration
Next, the training has to be configured in a training-pipeline config file. All aspects that controll the training are defined within this file. This includes general parameters such as batch size and learning rate, but also all the finer details such as data augmentation, model structure parameters, parallelization options, optimizer details as well as evaluation details. A example config file used for this project looks like this:
```json
model {
  ssd {
    num_classes: 1
    image_resizer {
      fixed_shape_resizer {
        height: 1024
        width: 1024
      }
    }
    feature_extractor {
      type: "ssd_resnet101_v1_fpn_keras"
      depth_multiplier: 1.0
      min_depth: 16
      conv_hyperparams {
        regularizer {
          l2_regularizer {
            weight: 0.0004
          }
        }
        initializer {
          truncated_normal_initializer {
            mean: 0.0
            stddev: 0.03
          }
        }
        activation: RELU_6
        batch_norm {
          decay: 0.997
          scale: true
          epsilon: 0.001
        }
      }
      override_base_feature_extractor_hyperparams: true
      fpn {
        min_level: 3
        max_level: 7
      }
    }
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    box_predictor {
      weight_shared_convolutional_box_predictor {
        conv_hyperparams {
          regularizer {
            l2_regularizer {
              weight: 0.0004
            }
          }
          initializer {
            random_normal_initializer {
              mean: 0.0
              stddev: 0.01
            }
          }
          activation: RELU_6
          batch_norm {
            decay: 0.997
            scale: true
            epsilon: 0.001
          }
        }
        depth: 256
        num_layers_before_predictor: 4
        kernel_size: 3
        class_prediction_bias_init: -4.6
      }
    }
    anchor_generator {
      multiscale_anchor_generator {
        min_level: 3
        max_level: 7
        anchor_scale: 4.0
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        scales_per_octave: 2
      }
    }
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-08
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
        use_static_shapes: false
      }
      score_converter: SIGMOID
    }
    normalize_loss_by_num_matches: true
    loss {
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_loss {
        weighted_sigmoid_focal {
          gamma: 2.0
          alpha: 0.25
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    encode_background_as_zeros: true
    normalize_loc_loss_by_codesize: true
    inplace_batchnorm_update: true
    freeze_batchnorm: false
  }
}
train_config {
  batch_size: 4
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  sync_replicas: true
  optimizer {
    momentum_optimizer {
      learning_rate {
        cosine_decay_learning_rate {
          learning_rate_base: 0.0025
          total_steps: 15000
          warmup_learning_rate: 0.00083333
          warmup_steps: 1500
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  fine_tune_checkpoint: "/home/dammeier@ab.ba.ba-ravensburg.de/dev/oil-storage-detection/data/06_models/pretrained-models/ssd_resnet101_v1_fpn_1024x1024_coco17_tpu-8/checkpoint/ckpt-0"
  num_steps: 15000
  startup_delay_steps: 0.0
  replicas_to_aggregate: 4
  max_number_of_boxes: 500
  unpad_groundtruth_tensors: false
  fine_tune_checkpoint_type: "detection"
  use_bfloat16: true
  fine_tune_checkpoint_version: V2
}
train_input_reader {
  label_map_path: "/home/dammeier@ab.ba.ba-ravensburg.de/dev/oil-storage-detection/data/03_primary/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "/home/dammeier@ab.ba.ba-ravensburg.de/dev/oil-storage-detection/data/05_model_input/train.record"
  }
}
eval_config {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
}
eval_input_reader {
  label_map_path: "/home/dammeier@ab.ba.ba-ravensburg.de/dev/oil-storage-detection/data/03_primary/label_map.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "/home/dammeier@ab.ba.ba-ravensburg.de/dev/oil-storage-detection/data/05_model_input/valid.record"
  }
}
```
The most important parameters for this project are the paths to the data, checkpoint and label map. 
### 3. Start Training and Evaluation Jobs
The API is designed to run seperate processes for training and evaluation. Both can be started using a command similar to the one generated at the beginning of this notebook. The console log will show the progress of the training, but its more practical to use TensorBoard instead.
### 4. Use TensorBoard to Monitor Training Progress
The API is designed to log important loss parameters and evaluation metrics automatically. TensorBoard allows to conveniently view those metrics as they are generated. 



TensorBoard can also be viewed within a jupyter notebook by using an iframe:

In [4]:
%load_ext tensorboard
%tensorboard --logdir /home/dammeier@ab.ba.ba-ravensburg.de/dev/oil-storage-detection/data/06_models/models/ssd_resnet101_1024/v2

Reusing TensorBoard on port 6006 (pid 1644137), started 0:01:33 ago. (Use '!kill 1644137' to kill it.)

# Training Results
For a detailed view on model performace, check out the evaluation notebook.

# Difficulties Encountered During Project Completion
A multitude of problemes where encountered using the TensorFlow object detection API but also the Bizon server. This is also the reason why not nearly as many models and configurations could be tried in the available time. Some noteworthy problems that we encountered:
- A bug in our data preparation pipeline took multiple days to find. The problem was that the bounding box coordinates weren't scaled correctly. 
- The TensorFlow object detection API relies heavily on the TensorFlow v1 compatibity mode of TF2. This caused a lot of problems during package installation and made it difficult to get a working conda environment.
- TensorFlow doesn't make it easy to access its object detection capabilities. The documentation for the object detection API is of poor quality and lacks completeness. PyTorch would have been more appropriate for the task, as most public object detection architectures use it for training.
- The Bizon server sometimes was at times fully occupied. This is especially frustrating if the occupant only utilizes the GPU, as the server is actually meant for GPU highly parallelized computing.