In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

# Object Detection using TAO RetinaNet

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://developer.nvidia.com/sites/default/files/akamai/embedded-transfer-learning-toolkit-software-stack-1200x670px.png" width="1080">

## Learning Objectives
In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Take a pretrained resnet18 model and train a ResNet-18 RetinaNet model on the KITTI dataset
* Prune the trained retinanet model
* Retrain the pruned model to recover lost accuracy
* Export the pruned model
* Quantize the pruned model using QAT
* Run Inference on the trained model
* Export the pruned, quantized and retrained model to a .etlt file for deployment to DeepStream
* Run inference on the exported. etlt model to verify deployment using TensorRT

## Table of Contents

This notebook shows an example usecase of RetinaNet object detection using Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables](#head-0)
1. [Prepare dataset and pre-trained model](#head-1) <br>
    1.1 [Download the dataset](#head-1-1) <br>
    1.2 [Validate the downloaded dataset](#head-1-2) <br>
    1.3 [Generate tfrecords from kitti format dataset](#head-1-3) <br>
    1.4 [Download pre-trained model](#head-1-4) <br>
2. [Setup GPU environment](#head-2) <br>
    2.1 [Connect to GPU Instance](#head-2-1) <br>
    2.2 [Mounting Google drive](#head-2-2) <br>
    2.3 [Setup Python environment](#head-2-3) <br>
    2.4 [Reset env variables](#head-2-4) <br>
3. [Provide training specification](#head-3)
4. [Run TAO training](#head-4)
5. [Evaluate trained models](#head-5)
6. [Prune trained models](#head-6)
7. [Retrain pruned models](#head-7)
8. [Evaluate retrained model](#head-8)
9. [Visualize inferences](#head-9)

## 0. Set up env variables <a class="anchor" id="head-0"></a>

When using the purpose-built pretrained models from NGC, please make sure to set the `$KEY` environment variable to the key as mentioned in the model overview. Failing to do so, can lead to errors when trying to load them as pretrained models.

*Note: Please make sure to remove any stray artifacts/files from the `$USER_EXPERIMENT_DIR` or `$DATA_DOWNLOAD_DIR` paths as mentioned below, that may have been generated from previous experiments. Having checkpoint files etc may interfere with creating a training graph for a new experiment.*

In [None]:
# Setting up env variables for cleaner command line commands.
import os

print("Please replace the variable with your key.")
%env KEY=YOUR_KEY
%env EXPERIMENT_DIR=/results/retinanet
%env DATA_DIR=/content/drive/MyDrive/pointpillars_data
%env SPECS_DIR=/content/drive/MyDrive/ColabNotebooks/tensorflow/retinanet/specs

# Showing list of specification files.
!ls -rlt $SPECS_DIR

## 1. Prepare dataset and pre-trained model <a class="anchor" id="head-2"></a>

 We will be using the KITTI detection dataset for the tutorial. To find more details please visit
 http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d. Please download the KITTI detection images (http://www.cvlibs.net/download.php?file=data_object_image_2.zip) and labels (http://www.cvlibs.net/download.php?file=data_object_label_2.zip) to $DATA_DOWNLOAD_DIR.

### 1.1. Download the dataset <a class="anchor" id="head-1-1"></a>
Once you have gotten the download links in your email, please populate them in place of the `KITTI_IMAGES_DOWNLOAD_URL` and the `KITTI_LABELS_DOWNLOAD_URL`. This next cell, will download the data and place in `$DATA_DIR`

In [None]:
# Create local dir
!mkdir -p $DATA_DIR
!mkdir -p $EXPERIMENT_DIR

In [None]:
import os
os.environ["URL_IMAGES"]=KITTI_IMAGES_DOWNLOAD_URL
!if [ ! -f $DATA_DIR/data_object_image_2.zip ]; then wget $URL_IMAGES -O $DATA_DIR/data_object_image_2.zip; else echo "image archive already downloaded"; fi 
os.environ["URL_LABELS"]=KITTI_LABELS_DOWNLOAD_URL
!if [ ! -f $DATA_DIR/data_object_label_2.zip ]; then wget $URL_LABELS -O $DATA_DIR/data_object_label_2.zip; else echo "label archive already downloaded"; fi 

### 1.2. Validate the downloaded dataset <a class="anchor" id="head-1-2"></a>

In [None]:
# Check the dataset is present
!if [ ! -f $DATA_DIR/data_object_image_2.zip ]; then echo 'Image zip file not found, please download.'; else echo 'Found Image zip file.';fi
!if [ ! -f $DATA_DIR/data_object_label_2.zip ]; then echo 'Label zip file not found, please download.'; else echo 'Found Labels zip file.';fi

In [None]:
# This may take a while: verify integrity of zip files 
!sha256sum $DATA_DIR/data_object_image_2.zip | cut -d ' ' -f 1 | grep -xq '^351c5a2aa0cd9238b50174a3a62b846bc5855da256b82a196431d60ff8d43617$' ; \
if test $? -eq 0; then echo "images OK"; else echo "images corrupt, redownload!" && rm -f $DATA_DIR/data_object_image_2.zip; fi 
!sha256sum $DATA_DIR/data_object_label_2.zip | cut -d ' ' -f 1 | grep -xq '^4efc76220d867e1c31bb980bbf8cbc02599f02a9cb4350effa98dbb04aaed880$' ; \
if test $? -eq 0; then echo "labels OK"; else echo "labels corrupt, redownload!" && rm -f $DATA_DIR/data_object_label_2.zip; fi 

In [None]:
# unpack 
!unzip -u $DATA_DIR/data_object_image_2.zip -d $DATA_DIR
!unzip -u $DATA_DIR/data_object_label_2.zip -d $DATA_DIR

In [None]:
# verify
!ls -l $DATA_DIR

In [None]:
# Generate val dataset out of training dataset
%cd /content/drive/MyDrive/ColabNotebooks/tensorflow/retinanet/specs
!python3 generate_val_dataset.py --input_image_dir=$DATA_DIR/training/image_2 \
                                 --input_label_dir=$DATA_DIR/training/label_2 \
                                 --output_dir=$DATA_DIR/val

### 1.3 Generate tfrecords from kitti format dataset <a class="anchor" id="head-1-3"></a>

- Update the tfrecords spec file to take in your kitti format dataset
- Create the tfrecords using the `dataset_convert`

*Note: TFRecords only need to be generated for the training set once.*

In [None]:
print("TFRecords conversion spec file:")
!cat $SPECS_DIR/retinanet_tfrecords_kitti_train.txt

In [None]:
# Creating a new directory for the output tfrecords dump.
print("Converting the training set to TFRecords.")
!mkdir -p $DATA_DIR/tfrecords && rm -rf $DATA_DIR/tfrecords/*
!tao retinanet dataset_convert \
               -d $SPECS_DIR/retinanet_tfrecords_kitti_train.txt \
               -o $DATA_DIR/tfrecords/kitti_train

In [None]:
!ls -rlt $DATA_DIR/tfrecords/

### 1.4 Download pre-trained model <a class="anchor" id="head-1-4"></a>

We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [None]:
# Installing NGC CLI on the local machine.
## Download and install
%env LOCAL_PROJECT_DIR=/content/
%env CLI=ngccli_cat_linux.zip
!mkdir -p $LOCAL_PROJECT_DIR/ngccli

# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u -q "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))
!cp /usr/lib/x86_64-linux-gnu/libstdc++.so.6 /content/ngccli/ngc-cli/libstdc++.so.6

In [None]:
!ngc registry model list nvidia/tao/pretrained_object_detection:*

In [None]:
!mkdir -p $EXPERIMENT_DIR/pretrained_resnet18/

In [None]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tao/pretrained_object_detection:resnet18 \
                    --dest $EXPERIMENT_DIR/pretrained_resnet18

In [None]:
print("Check that model is downloaded into dir.")
!ls -l $EXPERIMENT_DIR/pretrained_resnet18/pretrained_object_detection_vresnet18

## 2. Setup GPU environment <a class="anchor" id="head-2"></a>


### 2.1 Connect to GPU Instance <a class="anchor" id="head-2-1"></a>

1. Move any data saved to the Colab Instance storage to Google Drive  
2. Change Runtime type to GPU by Runtime(Top Left tab)->Change Runtime Type->GPU(Hardware Accelerator)
3.   Then click on Connect (Top Right)



### 2.2 Mounting Google drive <a class="anchor" id="head-2-2"></a>
Mount your Google drive storage to this Colab instance

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### 2.3 Setup Python environment <a class="anchor" id="head-2-3"></a>
Setup the environment necessary to run the TAO Networks by running the bash script

In [None]:
!sh /content/drive/MyDrive/ColabNotebooks/tensorflow/setup_env.sh

In [None]:
import os
os.environ["PYTHONPATH"]+=":/opt/nvidia/"
os.environ["PYTHONPATH"]+=":/usr/local/lib/python3.6/dist-packages/third_party/nvml"

### 2.4 Reset env variables <a class="anchor" id="head-2-4"></a>

In [None]:
# Setting up env variables for cleaner command line commands.
import os

%env KEY=nvidia_tlt
%env NUM_GPUS=1
%env EXPERIMENT_DIR=/results/classification
%env DATA_DIR=/content/drive/MyDrive/tf_data/classification_data/

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=~/tao-samples/classification

%env SPECS_DIR=/content/drive/MyDrive/ColabNotebooks/tensorflow/classification/specs

# Showing list of specification files.
!ls -rlt $SPECS_DIR

## 3. Provide training specification <a class="anchor" id="head-3"></a>
* Pre-trained models
* Augmentation parameters for on the fly data augmentation
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.
* *Note* that in the provided spec file, anchor boxes are set to have 3 aspect ratios (`aspect_ratios_global`) and 3 anchor sizes(`n_anchor_levels`) per feature map cell.
* *Note* that the provided spec file uses `batch_size_per_gpu: 24`, which assumes at least 16G GPU memory. If you need to adjust batch size, please adjust the learning rate accordingly.

In [None]:
!cat $SPECS_DIR/retinanet_train_resnet18_kitti.txt

## 4. Run TAO training <a class="anchor" id="head-4"></a>
* Provide the sample spec file and the output directory location for models
* WARNING: training will take several hours or one day to complete

In [None]:
!mkdir -p $EXPERIMENT_DIR/experiment_dir_unpruned

In [None]:
!sed -i "s|YOUR_PRETRAINED_MODEL|$USER_EXPERIMENT_DIR/pretrained_resnet18/pretrained_object_detection_vresnet18/resnet_18.hdf5|g" $SPECS_DIR/retinanet_train_resnet18_kitti.txt
print("To run with multigpu, please change --gpus based on the number of available GPUs in your machine.")
!tao retinanet train -e $SPECS_DIR/retinanet_train_resnet18_kitti.txt \
                     -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
                     -k $KEY \
                     --gpus 1

In [None]:
!mkdir -p $EXPERIMENT_DIR/experiment_dir_unpruned_qat
print("To run with QAT enabled, please uncomment and run the following command.")
# !sed -i "s/enable_qat: False/enable_qat: True/g" $SPECS_DIR/retinanet_train_resnet18_kitti.txt
!tao retinanet train -e $SPECS_DIR/retinanet_train_resnet18_kitti.txt \
                     -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned_qat \
                     -k $KEY \
                     --gpus 1

In [None]:
print("To resume training from a checkpoint, you need to update the spec file.")
print("use resume_model_path instead of pretrain_model_path with the checkpoint path you wish to resume from.")

In [None]:
print('Model for each epoch:')
print('---------------------')
!ls -ltrh $EXPERIMENT_DIR/experiment_dir_unpruned/weights

In [None]:
# You can check the evaluation stats in the csv file and pick the model with highest val accuracy.
!cat $EXPERIMENT_DIR/experiment_dir_unpruned/retinanet_training_log_resnet18.csv
%set_env EPOCH=100

## 5. Evaluate trained models <a class="anchor" id="head-5"></a>

In [None]:
!tao retinanet evaluate -e $SPECS_DIR/retinanet_train_resnet18_kitti.txt \
                        -m $EXPERIMENT_DIR/experiment_dir_unpruned/weights/retinanet_resnet18_epoch_$EPOCH.tlt \
                        -k $KEY

## 6. Prune trained models <a class="anchor" id="head-6"></a>
* Specify pre-trained model
* Equalization criterion (`Only for resnets as they have element wise operations or MobileNets.`)
* Threshold for pruning.
* A key to save and load the model
* Output directory to store the model

Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade off. Higher `pth` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold value depends on the dataset and the model. `0.4` in the block below is just a start point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.

In [None]:
# Create an output directory to save the pruned model.
!mkdir -p $EXPERIMENT_DIR/experiment_dir_pruned

In [None]:
!tao retinanet prune -m $EXPERIMENT_DIR/experiment_dir_unpruned/weights/retinanet_resnet18_epoch_$EPOCH.tlt \
               -o $EXPERIMENT_DIR/experiment_dir_pruned/retinanet_resnet18_pruned.tlt \
               -pth 0.4 \
               -k $KEY

In [None]:
!mkdir -p $EXPERIMENT_DIR/experiment_dir_pruned_qat
print("To prune a QAT model:")
# !tao retinanet prune -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned_qat/weights/retinanet_resnet18_epoch_$EPOCH.tlt \
#                -o $USER_EXPERIMENT_DIR/experiment_dir_pruned_qat/retinanet_resnet18_pruned.tlt \
#                -pth 0.4 \
#                -k $KEY

In [None]:
!ls -rlt $EXPERIMENT_DIR/experiment_dir_pruned/

## 7. Retrain pruned models <a class="anchor" id="head-7"></a>
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-training specification
* WARNING: training will take several hours or one day to complete

In [None]:
# Printing the retrain spec file. 
# Here we have updated the spec file to include the newly pruned model as a pretrained weights.
!sed -i "s|YOUR_PRETRAINED_MODEL|$USER_EXPERIMENT_DIR/experiment_dir_pruned/retinanet_resnet18_pruned.tlt|g" $SPECS_DIR/retinanet_retrain_resnet18_kitti.txt
!cat $SPECS_DIR/retinanet_retrain_resnet18_kitti.txt

In [None]:
!mkdir -p $EXPERIMENT_DIR/experiment_dir_retrain

In [None]:
# Retraining using the pruned model as pretrained weights.
print("To run with multigpu, please change --gpus based on the number of available GPUs in your machine.")
!tao retinanet train --gpus 1 \
                     -e $SPECS_DIR/retinanet_retrain_resnet18_kitti.txt \
                     -r $EXPERIMENT_DIR/experiment_dir_retrain \
                     -k $KEY

In [None]:
!mkdir -p $EXPERIMENT_DIR/experiment_dir_retrain_qat
print("To run with QAT enabled, please uncomment and run the following command.")
# !sed -i "s/enable_qat: False/enable_qat: True/g" $SPECS_DIR/retinanet_retrain_resnet18_kitti.txt
# !tao retinanet train --gpus 1 \
#                      -e $SPECS_DIR/retinanet_retrain_resnet18_kitti.txt \
#                      -r $USER_EXPERIMENT_DIR/experiment_dir_retrain_qat \
#                      -k $KEY

In [None]:
# Listing the newly retrained model.
!ls -rlt $EXPERIMENT_DIR/experiment_dir_retrain/weights
# !ls -rlt $EXPERIMENT_DIR/experiment_dir_retrain_qat/weights

In [None]:
# You can check the evaluation stats in the csv file and pick the model with highest val accuracy.
!cat $EXPERIMENT_DIR/experiment_dir_retrain/retinanet_training_log_resnet18.csv
%set_env EPOCH=010

## 8. Evaluate retrained model <a class="anchor" id="head-8"></a>

In [None]:
!tao retinanet evaluate -e $SPECS_DIR/retinanet_retrain_resnet18_kitti.txt \
                        -m $EXPERIMENT_DIR/experiment_dir_retrain/weights/retinanet_resnet18_epoch_$EPOCH.tlt \
                        -k $KEY

## 9. Visualize inferences <a class="anchor" id="head-9"></a>
In this section, we run the tlt-infer tool to generate inferences on the trained models and visualize the results.

In [None]:
# Running inference for detection on n images
!tao retinanet inference -i $DATA_DIR/testing/image_2 \
                         -o $EXPERIMENT_DIR/retinanet_infer_images \
                         -e $SPECS_DIR/retinanet_retrain_resnet18_kitti.txt \
                         -m $EXPERIMENT_DIR/experiment_dir_retrain/weights/retinanet_resnet18_epoch_$EPOCH.tlt \
                         -l $EXPERIMENT_DIR/retinanet_infer_labels \
                         -k $KEY

The `inference` tool produces two outputs. 
1. Overlain images in `$EXPERIMENT_DIR/retinanet_annotated_images`
2. Frame by frame bbox labels in kitti format located in `$EXPERIMENT_DIR/retinanet_infer_labels`

In [None]:
# Simple grid visualizer
!pip3 install matplotlib==3.3.3
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=1, num_images=1):
    output_path = os.path.join(os.environ['EXPERIMENT_DIR'], image_dir)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 

In [None]:
# Visualizing the sample images.
OUTPUT_PATH = 'retinanet_infer_images' # relative path from $EXPERIMENT_DIR.
COLS = 2 # number of columns in the visualizer grid.
IMAGES = 4 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)