# Tensorflow Object Detection API and AWS Sagemaker

In this notebook, you will train and evaluate different models using the [Tensorflow Object Detection API](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/) and [AWS Sagemaker](https://aws.amazon.com/sagemaker/). 

If you ever feel stuck, you can refer to this [tutorial](https://aws.amazon.com/blogs/machine-learning/training-and-deploying-models-using-tensorflow-2-with-the-object-detection-api-on-amazon-sagemaker/).

## Dataset

We are using the [Waymo Open Dataset](https://waymo.com/open/) for this project. The dataset has already been exported using the tfrecords format. The files have been created following the format described [here](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#create-tensorflow-records). You can find data stored on [AWS S3](https://aws.amazon.com/s3/), AWS Object Storage. The images are saved with a resolution of 640x640.

In [None]:
# %%capture
# %pip install tensorflow_io sagemaker -U

In [None]:
import os
import sagemaker
from sagemaker.estimator import Estimator
from framework import CustomFramework

Save the IAM role in a variable called `role`. This would be useful when training the model.

In [None]:
role = sagemaker.get_execution_role()
print(role)

In [None]:
# The train and val paths below are public S3 buckets created by Udacity for this project
inputs = {'train': 's3://cd2688-object-detection-tf2/train/', 
          'val': 's3://cd2688-object-detection-tf2/val/'} 

# Insert path of a folder in your personal S3 bucket to store tensorboard logs.
tensorboard_s3_prefix = 's3://object-detection-prj/logs/'

## Container

To train the model, you will first need to build a [docker](https://www.docker.com/) container with all the dependencies required by the TF Object Detection API. The code below does the following:
* clone the Tensorflow models repository
* get the exporter and training scripts from the repository
* build the docker image and push it 
* print the container name

In [None]:
%%bash

# # clone the repo and get the scripts
# git clone https://github.com/tensorflow/models.git docker/models

# # get model_main and exporter_main files from TF2 Object Detection GitHub repository
# cp docker/models/research/object_detection/exporter_main_v2.py source_dir 
# cp docker/models/research/object_detection/model_main_tf2.py source_dir

In [None]:
# # build and push the docker image. This code can be commented out after being run once.
# # This will take around 10 mins.
# image_name = 'tf2-object-detection'
# !sh ./docker/build_and_push.sh $image_name

To verify that the image was correctly pushed to the [Elastic Container Registry](https://aws.amazon.com/ecr/), you can look at it in the AWS webapp. For example, below you can see that three different images have been pushed to ECR. You should only see one, called `tf2-object-detection`.
![ECR Example](../data/example_ecr.png)


In [None]:
# display the container name
with open (os.path.join('docker', 'ecr_image_fullname.txt'), 'r') as f:
    container = f.readlines()[0][:-1]

print(container)

## Pre-trained model from model zoo

As often, we are not training from scratch and we will be using a pretrained model from the TF Object Detection model zoo. You can find pretrained checkpoints [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md). Because your time is limited for this project, we recommend to only experiment with the following models:
* SSD MobileNet V2 FPNLite 640x640	
* SSD ResNet50 V1 FPN 640x640 (RetinaNet50)	
* Faster R-CNN ResNet50 V1 640x640	
* EfficientDet D1 640x640	
* Faster R-CNN ResNet152 V1 640x640	

In the code below, the EfficientDet D1 model is downloaded and extracted. This code should be adjusted if you were to experiment with other architectures.

In [None]:
%%bash
mkdir /tmp/checkpoint
mkdir source_dir/checkpoint
wget -O /tmp/fasterrcnn152.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet152_v1_640x640_coco17_tpu-8.tar.gz
tar -zxvf /tmp/fasterrcnn152.tar.gz --strip-components 2 --directory source_dir/checkpoint faster_rcnn_resnet152_v1_640x640_coco17_tpu-8/checkpoint


In [None]:
# %%bash
# mkdir /tmp/checkpoint
# mkdir source_dir/checkpoint
# wget -O /tmp/fasterrcnn.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet101_v1_640x640_coco17_tpu-8.tar.gz
# tar -zxvf /tmp/fasterrcnn.tar.gz --strip-components 2 --directory source_dir/checkpoint faster_rcnn_resnet101_v1_640x640_coco17_tpu-8/checkpoint

In [None]:
# %%bash
# mkdir /tmp/checkpoint
# mkdir source_dir/checkpoint
# wget -O /tmp/efficientdet.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d1_coco17_tpu-32.tar.gz
# tar -zxvf /tmp/efficientdet.tar.gz --strip-components 2 --directory source_dir/checkpoint efficientdet_d1_coco17_tpu-32/checkpoint

## Edit pipeline.config file

The [`pipeline.config`](source_dir/pipeline.config) in the `source_dir` folder should be updated when you experiment with different models. The different config files are available [here](https://github.com/tensorflow/models/tree/master/research/object_detection/configs/tf2).

>Note: The provided `pipeline.config` file works well with the `EfficientDet` model. You would need to modify it when working with other models.

## Launch Training Job

Now that we have a dataset, a docker image and some pretrained model weights, we can launch the training job. To do so, we create a [Sagemaker Framework](https://sagemaker.readthedocs.io/en/stable/frameworks/index.html), where we indicate the container name, name of the config file, number of training steps etc.

The `run_training.sh` script does the following:
* train the model for `num_train_steps` 
* evaluate over the val dataset
* export the model

Different metrics will be displayed during the evaluation phase, including the mean average precision. These metrics can be used to quantify your model performances and compare over the different iterations.

You can also monitor the training progress by navigating to **Training -> Training Jobs** from the Amazon Sagemaker dashboard in the Web UI.

In [None]:
tensorboard_output_config = sagemaker.debugger.TensorBoardOutputConfig(
    s3_output_path=tensorboard_s3_prefix,
    container_local_output_path='/opt/training/'
)

estimator = CustomFramework(
    role=role,
    image_uri=container,
    entry_point='run_training.sh',
    source_dir='source_dir/',
    hyperparameters={
        "model_dir": "/opt/training",        
        "pipeline_config_path": "pipeline.config",
        "num_train_steps": "4000",    
        "sample_1_of_n_eval_examples": "1"
    },
    instance_count=1,
    instance_type='ml.g5.xlarge',
    tensorboard_output_config=tensorboard_output_config,
    disable_profiler=True,
    base_job_name='tf2-object-detection'
)

estimator.fit(inputs)

You should be able to see your model training in the AWS webapp as shown below:
![ECR Example](../data/example_trainings.png)


## Improve on the initial model

Most likely, this initial experiment did not yield optimal results. However, you can make multiple changes to the `pipeline.config` file to improve this model. One obvious change consists in improving the data augmentation strategy. The [`preprocessor.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/preprocessor.proto) file contains the different data augmentation method available in the Tf Object Detection API. Justify your choices of augmentations in the write-up.

Keep in mind that the following are also available:
* experiment with the optimizer: type of optimizer, learning rate, scheduler etc
* experiment with the architecture. The Tf Object Detection API model zoo offers many architectures. Keep in mind that the pipeline.config file is unique for each architecture and you will have to edit it.
* visualize results on the test frames using the `2_deploy_model` notebook available in this repository.

In the cell below, write down all the different approaches you have experimented with, why you have chosen them and what you would have done if you had more time and resources. Justify your choices using the tensorboard visualizations (take screenshots and insert them in your write-up), the metrics on the evaluation set and the generated animation you have created with [this tool](../2_run_inference/2_deploy_model.ipynb).

# Experiment Results

## Run 1 - Baseline - EfficientDet D1 640x640

```python
DONE (t=0.21s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.080
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.187
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.058
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.030
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.334
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.292
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.022
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.095
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.130
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.068
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.435
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.608
```

## Run 2

- SSD EfficientDet D1 640x640
- num_train_steps: 4000
- lowered lr
    ```
      optimizer {
    momentum_optimizer {
      learning_rate {
        cosine_decay_learning_rate {
          learning_rate_base: 0.007999999821186066
          total_steps: 4000
          warmup_learning_rate: 0.0010000000474974513
          warmup_steps: 150
        }
      }
      momentum_optimizer_value: 0.8999999761581421
    }
    use_moving_average: false
  }
    ```
- Added Augmentations in train config:

    ```
    data_augmentation_options {
        random_adjust_brightness {
        }
    }
      data_augmentation_options {
        random_patch_gaussian {
        }
      }
      data_augmentation_options {
          random_image_scale {
        }
      }
      data_augmentation_options {
        random_adjust_contrast {
        }
      }
      data_augmentation_options {
        random_adjust_saturation {
        }
      }
    ```
    
```python
DONE (t=0.21s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.118
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.266
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.093
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.051
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.409
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.505
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.028
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.120
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.169
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.103
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.506
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.617
```

## Run 3

- Faster R-CNN ResNet101 V1 640x640
- num_train_steps: 4000
- Same lr as run 2
- Same augmentations as run 2

```python
DONE (t=0.37s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.155
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.311
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.135
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.078
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.502
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.785
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.034
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.149
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.205
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.134
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.567
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.882
```

## Run 4

- Faster R-CNN ResNet152 V1 640x640
- num_train_steps: 4000
- lr as per model training config adjusted to fine-tuning steps
```
    optimizer {
        momentum_optimizer {
          learning_rate {
            cosine_decay_learning_rate {
              learning_rate_base: .04
              total_steps: 4000
              warmup_learning_rate: .013333
              warmup_steps: 150
            }
          }
          momentum_optimizer_value: 0.8999999761581421
        }
        use_moving_average: false
    }
   ```
- Same augmentations as run 2

```python
DONE (t=0.36s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.122
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.245
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.107
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.053
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.411
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.675
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.029
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.122
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.170
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.103
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.497
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.712
```

## In summary

**Baseline (EfficientDet D1 640x640):**
The initial run with the baseline EfficientDet D1 model achieved a mean average precision (mAP) of about 0.080 (IoU=0.50:0.95). While this was a reasonable starting point, a closer look at object scale metrics indicated where the model struggled:

Small objects: AP ~0.030
The model found it challenging to detect small objects reliably, likely due to its feature pyramid design not sufficiently capturing fine-grained details at the smallest scales or insufficient training steps.

Medium objects: AP ~0.334
Medium-sized objects fared better as the model’s multi-scale features are better aligned with the visual granularity of these objects.

Large objects: AP ~0.292
Large objects were somewhat easier to detect than small ones but did not reach the performance levels one might expect. This could be due to dataset characteristics or suboptimal tuning of hyperparameters that govern receptive fields and scaling.

**EfficientDet D1 with Augmentations and Adjusted Learning Rate:**
Introducing targeted data augmentations (random brightness, contrast, saturation, scaling, and Gaussian noise patches) along with a reduced learning rate led to an improved mAP of about 0.118. Looking at the scales:

Small objects: AP ~0.051
Augmentations and a more stable learning rate helped the model learn more robust features, improving detection of small objects.

Medium objects: AP ~0.409
Medium-sized objects benefited substantially. Data augmentation likely helped the model generalize better to varying environmental conditions and slight scale differences.

Large objects: AP ~0.505
The largest gain was seen in large object detection. The introduction of scaling augmentations possibly helped the model adjust its receptive fields more effectively, while color augmentations (contrast, brightness, saturation) improved model robustness to diverse lighting conditions.

**Why these Augmentations?**

Random Brightness/Contrast/Saturation: Real-world scenes vary greatly in lighting and color conditions. Adjusting brightness, contrast, and saturation forces the model to become invariant to these variations, reducing overfitting to a particular lighting scenario.
Random Image Scaling: Scaling teaches the model to handle objects at different sizes, improving its ability to detect both small and large objects in the real-world scenarios.
Gaussian Patch Noise: Introducing local noise encourages the model to rely on structural patterns rather than clean pixels. This can improve robustness against sensor noise and compression artifacts often found in automotive imagery.

**Faster R-CNN ResNet101 V1 640x640:**
Switching to a different architecture known for more robust feature extraction and region proposal methods yielded an mAP of about 0.155.

Small objects: AP ~0.078
The jump in small-object AP suggests that the more powerful backbone (ResNet101) combined with the region proposal mechanism of Faster R-CNN helps isolate smaller objects more effectively than the single-shot approach of EfficientDet.

Medium objects: AP ~0.502
Medium-sized objects again show a strong improvement. The deeper ResNet101 backbone offers richer feature representations that help differentiate medium-scale objects more clearly.

Large objects: AP ~0.785
Large-object detection experienced a dramatic improvement. This underscores the advantage of a two-stage detector like Faster R-CNN for objects that span a larger fraction of the image, as region proposals and subsequent refinement steps can more accurately localize bigger targets.

**Faster R-CNN ResNet152 V1 640x640:**
While we might expect a deeper backbone to improve results further, this run yielded an mAP of about 0.122, slightly lower than the ResNet101 run.

Small objects: AP ~0.053
Performance on small objects dipped slightly compared to ResNet101. This could be due to suboptimal hyperparameters or training steps, as deeper networks sometimes require more careful tuning or longer training.

Medium objects: AP ~0.411
Medium-object performance remained decent but did not match the gains seen with ResNet101.

Large objects: AP ~0.675
Although still strong, large-object detection did not reach the heights achieved by ResNet101. This suggests that simply increasing depth does not guarantee better performance without corresponding adjustments in optimization and augmentation strategies.


Here is the screenshot from Tensorboard showing the mAP (large, medium and small) for the different runs:

![alt text](image.png "Title")