#  Training a SSD MobileNet V2 for Pedestrian Detection with TLT

This notebook shows how to train a SSD MobileNet V2 object detector for pedestrian detection with [Nvidia's Transfer Learning Toolkit](https://developer.nvidia.com/transfer-learning-toolkit).
TLT is a useful tool for train classification and object detection models with various pretrained backbone. The trained models with TLT can be easily integrate and deploy to DeepStream.
Note that this notebook and specs files are just a modification of examples that exist on the [TLT docker container](https://ngc.nvidia.com/catalog/containers/nvidia:tlt-streamanalytics).

0. [Set up env variables](#head-0)
1. [Prepare dataset and pre-trained model](#head-1) <br>
    1.1 [Prepare tfrecords from kitti format dataset](#head-1-1) <br>
    1.2 [Download pre-trained model](#head-1-2) <br>
2. [Provide training specification](#head-2)
3. [Run TLT training](#head-3)
4. [Evaluate trained models](#head-4)
5. [Prune trained models](#head-5)
6. [Retrain pruned models](#head-6)
7. [Evaluate retrained model](#head-7)
8. [Visualize inferences](#head-8)
9. [Deploy](#head-9)

## 0. Set up env variables <a class="anchor" id="head-0"></a>


In [None]:
# Setting up env variables for cleaner command line commands.
print("Please replace the variable with your key.")
%set_env KEY=[YOUR NGC API KEY]
%set_env USER_EXPERIMENT_DIR=/experiment_dir/ped_ssd_mobilenet_v2
%set_env DATASET_DIR=/experiment_dir/dataset
%set_env SPECS_DIR=/repo/training/tlt/pedestrian_detection/specs
%set_env TF_FORCE_GPU_ALLOW_GROWTH=true
!mkdir -p $USER_EXPERIMENT_DIR

## 1. Prepare dataset and pre-trained model <a class="anchor" id="head-1"></a>

We will be using [Oxford Town Center Dataset](https://www.robots.ox.ac.uk/ActiveVision/Research/Projects/2009bbenfold_headpose/project.html) for training our pedestrian detection. First we will download the video file and annotatoin `csv` file with `download_towncenter_video_and_labels.sh` script and then create images and kitti format annotation files with [ffmpeg](https://ffmpeg.org/) and `extract_kitti_labels.py` scripts. note that TLT will not perform dynamic resizing in training time and every image should be resize to a certain size offline (for SSD detectord the width and height of images should be a multiply of 32). Also for training object detectors the annotations should be in KITTI format. for further information about creating KITTI format annotations checkout [TLT getting started guide](https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html)

In [None]:
# Download Oxford Town Center Dataset video file and csv annotation file
!bash ./download_towncenter_video_and_labels.sh

In [None]:
# Extract images from video, resize them and save in images directory with `ffmpeg`
!mkdir $DATASET_DIR/images
!ffmpeg -i $DATASET_DIR/TownCentreXVID.avi -q:v 1 -start_number 0 -vf scale=320:320 -frames:v 4501 $DATASET_DIR/images/%d.jpg

In [None]:
# Create KITTI annotation file and save in labels directory
!python3 extract_kitti_labels.py --annotation_path $DATASET_DIR/TownCentre-groundtruth.top --image_width 320 --image_height 320

In [None]:
# verify
!ls -lh $DATASET_DIR/

### 1.1 Prepare tfrecords from kitti format dataset <a class="anchor" id="head-1-1"></a>

* Update the tfrecords spec file to take in your kitti format dataset
* Create the tfrecords using the tlt-dataset-convert 
* TFRecords only need to be generated once.

In [None]:
print("TFrecords conversion spec file for training")
!cat $SPECS_DIR/ssd_tfrecords_towncenter_trainval.txt

In [None]:
# Creating a new directory for the output tfrecords dump.
# !mkdir -p $USER_EXPERIMENT_DIR/tfrecords
#KITTI trainval
!tlt-dataset-convert -d $SPECS_DIR/ssd_tfrecords_towncenter_trainval.txt \
                     -o $DATA_DOWNLOAD_DIR/tfrecords/

In [None]:
!ls -rlt $DATASET_DIR/tfrecords/

### 1.2 Download pre-trained model <a class="anchor" id="head-1-2"></a>

We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [None]:
!ngc registry model list nvidia/tlt_pretrained_object_detection:*

In [None]:
!mkdir -p $USER_EXPERIMENT_DIR/pretrained_mobilenet_v2/

In [None]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tlt_pretrained_object_detection:mobilenet_v2 --dest $USER_EXPERIMENT_DIR/pretrained_mobilenet_v2

In [None]:
print("Check that model is downloaded into dir.")
!ls -lh $USER_EXPERIMENT_DIR/pretrained_mobilenet_v2/tlt_pretrained_object_detection_vmobilenet_v2

## 2. Provide training specification <a class="anchor" id="head-2"></a>
* Tfrecords for the train datasets
    * In order to use the newly generated tfrecords, update the dataset_config parameter in the spec file at `$SPECS_DIR/ssd_train_resnet18_kitti.txt` 
    * Update the fold number to use for evaluation. In case of random data split, please use fold 0 only
    * For sequence wise you may use any fold generated from the dataset convert tool
* Pre-trained models
* Augmentation parameters for on the fly data augmentation
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.

In [None]:
!cat $SPECS_DIR/ped_ssd_mobilenet_v2_train.txt

## 3. Run TLT training <a class="anchor" id="head-3"></a>
* Provide the sample spec file and the output directory location for models
* WARNING: training will take several hours or one day to complete

In [None]:
!mkdir -p $USER_EXPERIMENT_DIR/experiment_dir_unpruned

In [None]:
print("To run with multigpu, please change --gpus based on the number of available GPUs in your machine.")
!tlt-train ssd -e $SPECS_DIR/ped_ssd_mobilenet_v2_train.txt \
               -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
               -k $KEY \
               -m $USER_EXPERIMENT_DIR/pretrained_mobilenet_v2/tlt_pretrained_object_detection_vmobilenet_v2/mobilenet_v2.hdf5 \
               --gpus 1

In [None]:
print('Model for each epoch:')
print('---------------------')
!ls -ltrh $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights

In [None]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
# Note csv epoch number is 1 less than model file epoch. For example, epoch 79 in csv corresponds to _080.tlt
!cat $USER_EXPERIMENT_DIR/experiment_dir_unpruned/ssd_training_log_mobilenet_v2.csv

In [None]:
# Set the epoch number which has best performance
%set_env EPOCH=060

## 4. Evaluate trained models <a class="anchor" id="head-4"></a>

In [None]:
!tlt-evaluate ssd -e $SPECS_DIR/ped_ssd_mobilenet_v2_train.txt \
                  -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/ssd_mobilenet_v2_epoch_$EPOCH.tlt \
                  -k $KEY

## 5. Prune trained models <a class="anchor" id="head-5"></a>
* Specify pre-trained model
* Equalization criterion (`Only for resnets as they have element wise operations or MobileNets.`)
* Threshold for pruning.
* A key to save and load the model
* Output directory to store the model

Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade off. Higher `pth` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold value depends on the dataset and the model. `0.5` in the block below is just a start point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.

In [None]:
!mkdir -p $USER_EXPERIMENT_DIR/experiment_dir_pruned

In [None]:
!tlt-prune -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/ssd_mobilenet_v2_epoch_$EPOCH.tlt \
           -o $USER_EXPERIMENT_DIR/experiment_dir_pruned/ssd_mobilenet_v2_pruned.tlt \
           -eq intersection \
           -pth 0.5 \
           -k $KEY

In [None]:
!ls -rlt $USER_EXPERIMENT_DIR/experiment_dir_pruned/

## 6. Retrain pruned models <a class="anchor" id="head-6"></a>
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-training specification
* WARNING: training will take several hours or one day to complete

In [None]:
# Printing the retrain spec file. 
# Here we have updated the spec file to include the newly pruned model as a pretrained weights.
!cat $SPECS_DIR/ped_ssd_mobilenet_v2_retrain.txt

In [None]:
!mkdir -p $USER_EXPERIMENT_DIR/experiment_dir_retrain

In [None]:
# Retraining using the pruned model as pretrained weights 
!tlt-train ssd --gpus 1 \
               -e $SPECS_DIR/ped_ssd_mobilenet_v2_retrain.txt \
               -r $USER_EXPERIMENT_DIR/experiment_dir_retrain \
               -m $USER_EXPERIMENT_DIR/experiment_dir_pruned/ssd_mobilenet_v2_pruned.tlt \
               -k $KEY

In [None]:
# Listing the newly retrained model.
!ls -rlht $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights

In [None]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
# Note csv epoch number is 1 less than model file epoch. For example, epoch 79 in csv corresponds to _080.tlt
!cat $USER_EXPERIMENT_DIR/experiment_dir_retrain/ssd_training_log_mobilenet_v2.csv

In [None]:
# Set the epoch number which has best performance
%set_env EPOCH=035

## 7. Evaluate retrained model <a class="anchor" id="head-7"></a>

In [None]:
!tlt-evaluate ssd -e $SPECS_DIR/ped_ssd_mobilenet_v2_retrain.txt \
                  -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_mobilenet_v2_epoch_$EPOCH.tlt \
                  -k $KEY

## 8. Visualize inferences <a class="anchor" id="head-8"></a>
In this section, we run the tlt-infer tool to generate inferences on the trained models and visualize the results.

In [None]:
# Running inference for detection on n images
!tlt-infer ssd -i $DATASET_DIR/images \
               -o $USER_EXPERIMENT_DIR/ssd_infer_images \
               -e $SPECS_DIR/ped_ssd_mobilenet_v2_retrain.txt \
               -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_mobilenet_v2_epoch_$EPOCH.tlt \
               -l $USER_EXPERIMENT_DIR/ssd_infer_labels \
               -k $KEY

The `tlt-infer` tool produces two outputs. 
1. Overlain images in `$USER_EXPERIMENT_DIR/ssd_infer_images`
2. Frame by frame bbox labels in kitti format located in `$USER_EXPERIMENT_DIR/ssd_infer_labels`

In [None]:
# Simple grid visualizer
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=4, num_images=10):
    output_path = os.path.join(os.environ['USER_EXPERIMENT_DIR'], image_dir)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx / num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 

In [None]:
# Visualizing the sample images.
OUTPUT_PATH = 'ssd_infer_images' # relative path from $USER_EXPERIMENT_DIR.
COLS = 3 # number of columns in the visualizer grid.
IMAGES = 9 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

## 9. Deploy! <a class="anchor" id="head-9"></a>

In [None]:
!mkdir -p $USER_EXPERIMENT_DIR/export
# Export in FP32 mode. Change --data_type to fp16 for FP16 mode
!tlt-export ssd -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_mobilenet_v2_epoch_$EPOCH.tlt \
                -k $KEY \
                -o $USER_EXPERIMENT_DIR/export/ped_ssd_mobilenet_v2_epoch_$EPOCH.etlt \
                -e $SPECS_DIR/ped_ssd_mobilenet_v2_retrain.txt \
                --batch_size 1 \
                --data_type fp16

# Uncomment to export in INT8 mode (generate calibration cache file). \
# !tlt-export ssd -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_mobilenet_v2_epoch_$EPOCH.tlt  \
#                 -o $USER_EXPERIMENT_DIR/export/ped_ssd_mobilenet_v2_epoch_$EPOCH.etlt \
#                 -e $SPECS_DIR/ped_ssd_mobilenet_v2_retrain.txt \
#                 -k $KEY \
#                 --cal_image_dir  $USER_EXPERIMENT_DIR/data/testing/image_2 \
#                 --data_type int8 \
#                 --batch_size 1 \
#                 --batches 10 \
#                 --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin  \
#                 --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile

`Note:` In this example, for ease of execution we restrict the number of calibrating batches to 10. TLT recommends the use of at least 10% of the training dataset for int8 calibration.

In [None]:
print('Exported model:')
print('------------')
!ls -lh $USER_EXPERIMENT_DIR/export

#### You can use this `.etlt` file directly in DeepStream or first create a `Tensor RT Engine` and then use the `engine` in DeepStream