## Get the TensorRT tar file before running this Notebook

1. Visit https://developer.nvidia.com/tensorrt
2. Clicking `Download now` from step one directs you to https://developer.nvidia.com/nvidia-tensorrt-download where you have to Login/Join Now for Nvidia Developer Program Membership
3. Now, in the download page: Choose TensorRT 8 in available versions
4. Agree to Terms and Conditions
5. Click on TensorRT 8.5 GA to expand the available options
6. Click on 'TensorRT 8.5 GA for Linux x86_64 and CUDA 11.0, 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7 and 11.8 TAR Package' to dowload the TAR file
7. Upload the the tar file to your Google Drive

## Connect to GPU Instance

1. Change Runtime type to GPU by Runtime(Top Left tab)->Change Runtime Type->GPU(Hardware Accelerator)
1. Then click on Connect (Top Right)

## Mounting Google drive
Mount your Google drive storage to this Colab instance

In [None]:
import sys
if 'google.colab' in sys.modules:
    %env GOOGLE_COLAB=1
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
else:
    %env GOOGLE_COLAB=0
    print("Warning: Not a Colab Environment")

# Object Detection using TAO DSSD

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://developer.nvidia.com/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png" width="1080">

## Learning Objectives
In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Take a pretrained resnet18 model and train a ResNet-18 DSSD model on the KITTI dataset
* Prune the trained DSSD model
* Retrain the pruned model to recover lost accuracy
* Export the pruned model
* Quantize the pruned model using QAT
* Run Inference on the trained model
* Export the pruned, quantized and retrained model to a .etlt file for deployment to DeepStream

## Table of Contents

This notebook shows an example usecase of DSSD object detection using Train Adapt Optimize (TAO) Toolkit.

1. [Set up env variables](#head-1)
2. [Prepare dataset and pre-trained model](#head-2) <br>
    2.1 [Download pre-trained model](#head-2-1) <br>
3. [Setup GPU environment](#head-3) <br>
    3.1 [Setup Python environment](#head-3-1) <br>
3. [Provide training specification](#head-3)
4. [Run TAO training](#head-4)
5. [Evaluate trained models](#head-5)
6. [Prune trained models](#head-6)
7. [Retrain pruned models](#head-7)
8. [Evaluate retrained model](#head-8)
9. [Visualize inferences](#head-9)
10. [Model Export](#head-10)
11. [Verify deployed model](#head-11)

#### Note
1. This notebook currently is by default set up to run training using 1 GPU. To use more GPU's please update the env variable `$NUM_GPUS` accordingly
1. This notebook uses KITTI dataset by default, which should be around ~12 GB. If you are limited by Google-Drive storage, we recommend to:

    i. Download the dataset onto the local system

    ii. Run the utility script at $COLAB_NOTEBOOKS/tensorflow/utils/generate_kitti_subset.py in your local system

    iii. This generates a subset of kitti dataset with number of sample images you wish for

    iv. Upload this subset onto Google Drive

1. Using the default config/spec file provided in this notebook, each weight file size of dssd created during training will be ~157 MB

## 1. Set up env variables and set FIXME parameters <a class="anchor" id="head-1"></a>

*Note: This notebook currently is by default set up to run training using 1 GPU. To use more GPU's please update the env variable `$NUM_GPUS` and `$GPU_INDEX` accordingly*

#### FIXME
1. NUM_GPUS - set this to <= number of GPU's availble on the instance
1. GPU_INDEX - set to to the indices of the GPU available on the instance
1. COLAB_NOTEBOOKS_PATH - for Google Colab environment, set this path where you want to clone the repo to; for local system environment, set this path to the already cloned repo
1. EXPERIMENT_DIR - set this path to a folder location where pretrained models, checkpoints and log files during different model actions will be saved
1. delete_existing_experiments - set to True to remove existing pretrained models, checkpoints and log files of a previous experiment
1. DATA_DIR - set this path to a folder location where you want to dataset to be present
1. delete_existing_data - set this to True to remove existing preprocessed and original data
1. trt_tar_path - set this path of the uploaded TensorRT tar.gz file after browser download
1. trt_untar_folder_path - set to path of the folder where the TensoRT tar.gz file has to be untarred into
1. trt_version - set this to the version of TRT you have downloaded

In [None]:
# Setting up env variables for cleaner command line commands.
import os

%env TAO_DOCKER_DISABLE=1

%env KEY=nvidia_tlt
#FIXME1
%env NUM_GPUS=1
#FIXME2
%env GPU_INDEX=0

#FIXME3
%env COLAB_NOTEBOOKS_PATH=/content/drive/MyDrive/nvidia-tao
if os.environ["GOOGLE_COLAB"] == "1":
    if not os.path.exists(os.path.join(os.environ["COLAB_NOTEBOOKS_PATH"])):

      !git clone https://github.com/NVIDIA-AI-IOT/nvidia-tao.git $COLAB_NOTEBOOKS_PATH
else:
    if not os.path.exists(os.environ["COLAB_NOTEBOOKS_PATH"]):
        raise Exception("Error, enter the path of the colab notebooks repo correctly")

#FIXME4
%env EXPERIMENT_DIR=/content/drive/MyDrive/results/dssd
#FIXME5
delete_existing_experiments = True
#FIXME6
%env DATA_DIR=/content/drive/MyDrive/kitti_data/
#FIXME7
delete_existing_data = False

if delete_existing_experiments:
    !sudo rm -rf $EXPERIMENT_DIR
if delete_existing_data:
    !sudo rm -rf $DATA_DIR

SPECS_DIR=f"{os.environ['COLAB_NOTEBOOKS_PATH']}/tensorflow/dssd/specs"
%env SPECS_DIR={SPECS_DIR}
# Showing list of specification files.
!ls -rlt $SPECS_DIR

!sudo mkdir -p $DATA_DIR && sudo chmod -R 777 $DATA_DIR
!sudo mkdir -p $EXPERIMENT_DIR && sudo chmod -R 777 $EXPERIMENT_DIR

## 2. Prepare dataset and pre-trained model <a class="anchor" id="head-2"></a>

We will be using NVIDIA created Synthetic Object detection data based on KITTI dataset format in this notebook. To find more details about kitti format, please visit [here](https://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d).

**If using custom dataset; it should follow this dataset structure**
```
$DATA_DIR/training
├── images
│   ├── image_name_1.jpg
│   ├── image_name_2.jpg
|   ├── ...
└── labels
    ├── image_name_1.txt
    ├── image_name_2.txt
    ├── ...
$DATA_DIR/val
├── images
│   ├── image_name_5.jpg
│   ├── image_name_6.jpg
|   ├── ...
└── labels
    ├── image_name_5.txt
    ├── image_name_6.txt
    ├── ...
```
The file name should be same for images and labels folders

### 2.1 Download the dataset <a class="anchor" id="head-2-1"></a>

In [None]:
!python3 -m pip install awscli
!aws s3 cp --no-sign-request s3://tao-object-detection-synthetic-dataset/tao_od_synthetic_train.tar.gz $DATA_DIR/
!aws s3 cp --no-sign-request s3://tao-object-detection-synthetic-dataset/tao_od_synthetic_val.tar.gz $DATA_DIR/

!mkdir -p $DATA_DIR/train/ && rm -rf $DATA_DIR/train/*
!mkdir -p $DATA_DIR/val/ && rm -rf $DATA_DIR/val/*

!tar -xzf $DATA_DIR/tao_od_synthetic_train.tar.gz -C $DATA_DIR/train/
!tar -xzf $DATA_DIR/tao_od_synthetic_val.tar.gz -C $DATA_DIR/val/

### 2.1 Download pre-trained model <a class="anchor" id="head-2-1"></a>

We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [None]:
# Installing NGC CLI on the local machine.
## Download and install
%env LOCAL_PROJECT_DIR=/ngc_content/
%env CLI=ngccli_cat_linux.zip
!sudo mkdir -p $LOCAL_PROJECT_DIR/ngccli && sudo chmod -R 777 $LOCAL_PROJECT_DIR

# Remove any previously existing CLI installations
!sudo rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u -q "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))
!cp /usr/lib/x86_64-linux-gnu/libstdc++.so.6 $LOCAL_PROJECT_DIR/ngccli/ngc-cli/libstdc++.so.6

In [None]:
!ngc registry model list nvidia/tao/pretrained_object_detection:*

In [None]:
!mkdir -p $EXPERIMENT_DIR/pretrained_resnet18/

In [None]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tao/pretrained_object_detection:resnet18 --dest $EXPERIMENT_DIR/pretrained_resnet18

In [None]:
print("Check that model is downloaded into dir.")
!ls -l $EXPERIMENT_DIR/pretrained_resnet18/pretrained_object_detection_vresnet18

## 3. Setup GPU environment <a class="anchor" id="head-3"></a>


### 3.1 Setup Python environment <a class="anchor" id="head-3-1"></a>
Setup the environment necessary to run the TAO Networks by running the bash script

In [None]:
# FIXME 8: set this path of the uploaded TensorRT tar.gz file after browser download
trt_tar_path="/content/drive/MyDrive/TensorRT-8.5.1.7.Linux.x86_64-gnu.cuda-11.8.cudnn8.6.tar.gz"

import os
if not os.path.exists(trt_tar_path):
  raise Exception("TAR file not found in the provided path")

# FIXME 9: set to path of the folder where the TensoRT tar.gz file has to be untarred into
%env trt_untar_folder_path=/content/trt_untar
# FIXME 10: set this to the version of TRT you have downloaded
%env trt_version=8.5.1.7

!mkdir -p $trt_untar_folder_path

import os

untar = True
for fname in os.listdir(os.environ.get("trt_untar_folder_path", None)):
  if fname.startswith("TensorRT-"+os.environ.get("trt_version")) and not fname.endswith(".tar.gz"):
    untar = False

if untar:
  !tar -xzf $trt_tar_path -C /content/trt_untar

In [None]:
import os
if os.environ["GOOGLE_COLAB"] == "1":
    os.environ["bash_script"] = "setup_env.sh"
else:
    os.environ["bash_script"] = "setup_env_desktop.sh"

!sed -i "s|PATH_TO_TRT|$trt_untar_folder_path|g"$COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script
!sed -i "s|TRT_VERSION|$trt_version|g" $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script
!sed -i "s|PATH_TO_COLAB_NOTEBOOKS|$COLAB_NOTEBOOKS_PATH|g" $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script

!sh $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script

## 4. Generate tfrecords <a class="anchor" id="head-4"></a>
* Create the tfrecords on the dataset split

In [None]:
print("TFRecords conversion spec file:")
!sed -i "s|TAO_DATA_PATH|$DATA_DIR/|g" $SPECS_DIR/dssd_tfrecords_kitti_train.txt
!cat $SPECS_DIR/dssd_tfrecords_kitti_train.txt

In [None]:
# Creating a new directory for the output tfrecords dump.
print("Converting the training set to TFRecords.")
!mkdir -p $DATA_DIR/tfrecords && sudo rm -rf $DATA_DIR/tfrecords/*
!tao model dssd dataset_convert \
          -d $SPECS_DIR/dssd_tfrecords_kitti_train.txt \
          -o $DATA_DIR/tfrecords/kitti_train

In [None]:
!ls -rlt $DATA_DIR/tfrecords/

## 5. Provide training specification <a class="anchor" id="head-5"></a>
* Dataset for the train datasets
    * In order to use the newly generated dataset, update the dataset_config parameter in the spec file at `$SPECS_DIR/dssd_train_resnet18_kitti.txt` 
* Augmentation parameters for on the fly data augmentation
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.
* Whether to use quantization aware training (QAT)

In [None]:
# To enable QAT training on sample spec file, uncomment following lines
# !sed -i "s/enable_qat: false/enable_qat: true/g" $SPECS_DIR/dssd_train_resnet18_kitti.txt
# !sed -i "s/enable_qat: false/enable_qat: true/g" $SPECS_DIR/dssd_retrain_resnet18_kitti.txt

In [None]:
# By default, the sample spec file disables QAT training. You can force non-QAT training by running lines below
# !sed -i "s/enable_qat: true/enable_qat: false/g" $SPECS_DIR/dssd_train_resnet18_kitti.txt
# !sed -i "s/enable_qat: true/enable_qat: false/g" $SPECS_DIR/dssd_retrain_resnet18_kitti.txt

In [None]:
!sed -i "s|TAO_DATA_PATH|$DATA_DIR/|g" $SPECS_DIR/dssd_train_resnet18_kitti.txt
!cat $SPECS_DIR/dssd_train_resnet18_kitti.txt

## 6. Run TAO training <a class="anchor" id="head-6"></a>
* Provide the sample spec file and the output directory location for models
* WARNING: training will take several hours or one day to complete

In [None]:
!mkdir -p $EXPERIMENT_DIR/experiment_dir_unpruned

In [None]:
print("To run with multigpu, please change --gpus based on the number of available GPUs in your machine.")
!tao model dssd train --gpus 1 --gpu_index=$GPU_INDEX \
                -e $SPECS_DIR/dssd_train_resnet18_kitti.txt \
                -r $EXPERIMENT_DIR/experiment_dir_unpruned \
                -k $KEY \
                -m $EXPERIMENT_DIR/pretrained_resnet18/pretrained_object_detection_vresnet18/resnet_18.hdf5

In [None]:
print("To resume from checkpoint, please uncomment and run this instead. Change last two arguments accordingly.")
# !tao model dssd train --gpus 1 --gpu_index=$GPU_INDEX \
#                 -e $SPECS_DIR/dssd_train_resnet18_kitti.txt \
#                 -r $EXPERIMENT_DIR/experiment_dir_unpruned \
#                 -k $KEY \
#                 -m $EXPERIMENT_DIR/experiment_dir_unpruned/weights/dssd_resnet18_epoch_001.tlt \
#                 --initial_epoch 2

In [None]:
print('Model for each epoch:')
print('---------------------')
!ls -ltrh $EXPERIMENT_DIR/experiment_dir_unpruned/weights

In [None]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
!cat $EXPERIMENT_DIR/experiment_dir_unpruned/dssd_training_log_resnet18.csv
%env EPOCH=080

## 7. Evaluate trained models <a class="anchor" id="head-7"></a>

In [None]:
!tao model dssd evaluate --gpu_index=$GPU_INDEX \
                   -e $SPECS_DIR/dssd_train_resnet18_kitti.txt \
                   -m $EXPERIMENT_DIR/experiment_dir_unpruned/weights/dssd_resnet18_epoch_$EPOCH.tlt \
                   -k $KEY

## 8. Prune trained models <a class="anchor" id="head-8"></a>
* Specify pre-trained model
* Equalization criterion (`Only for resnets as they have element wise operations or MobileNets.`)
* Threshold for pruning.
* A key to save and load the model
* Output directory to store the model

Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade off. Higher `pth` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold value depends on the dataset and the model. `0.5` in the block below is just a start point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.

In [None]:
!mkdir -p $EXPERIMENT_DIR/experiment_dir_pruned

In [None]:
!tao model dssd prune --gpu_index=$GPU_INDEX \
                -m $EXPERIMENT_DIR/experiment_dir_unpruned/weights/dssd_resnet18_epoch_$EPOCH.tlt \
                -o $EXPERIMENT_DIR/experiment_dir_pruned/dssd_resnet18_pruned.tlt \
                -eq intersection \
                -pth 0.1 \
                -k $KEY

In [None]:
!ls -rlt $EXPERIMENT_DIR/experiment_dir_pruned/

## 9. Retrain pruned models <a class="anchor" id="head-9"></a>
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-training specification
* WARNING: training will take several hours or one day to complete

In [None]:
# Printing the retrain spec file. 
# Here we have updated the spec file to include the newly pruned model as a pretrained weights.
!sed -i "s|TAO_DATA_PATH|$DATA_DIR/|g" $SPECS_DIR/dssd_retrain_resnet18_kitti.txt
!cat $SPECS_DIR/dssd_retrain_resnet18_kitti.txt

In [None]:
!mkdir -p $EXPERIMENT_DIR/experiment_dir_retrain

In [None]:
# Retraining using the pruned model as pretrained weights 
!tao model dssd train --gpus 1 --gpu_index=$GPU_INDEX \
                -e $SPECS_DIR/dssd_retrain_resnet18_kitti.txt \
                -r $EXPERIMENT_DIR/experiment_dir_retrain \
                -m $EXPERIMENT_DIR/experiment_dir_pruned/dssd_resnet18_pruned.tlt \
                -k $KEY

In [None]:
# Listing the newly retrained model.
!ls -rlt $EXPERIMENT_DIR/experiment_dir_retrain/weights

In [None]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
!cat $EXPERIMENT_DIR/experiment_dir_retrain/dssd_training_log_resnet18.csv
%env EPOCH=080

## 10. Evaluate retrained model <a class="anchor" id="head-10"></a>

In [None]:
!tao model dssd evaluate --gpu_index=$GPU_INDEX \
                   -e $SPECS_DIR/dssd_retrain_resnet18_kitti.txt \
                   -m $EXPERIMENT_DIR/experiment_dir_retrain/weights/dssd_resnet18_epoch_$EPOCH.tlt \
                   -k $KEY

## 11. Visualize inferences <a class="anchor" id="head-11"></a>
In this section, we run the `infer` tool to generate inferences on the trained models and visualize the results.

In [None]:
!ls $DATA_DIR/val/images

In [None]:
# Copy some test images
!mkdir -p $DATA_DIR/test_samples
!cp $DATA_DIR/val/images/* $DATA_DIR/test_samples

In [None]:
# Running inference for detection on n images
!tao model dssd inference --gpu_index=$GPU_INDEX -i $DATA_DIR/test_samples \
                    -o $EXPERIMENT_DIR/dssd_infer_images \
                    -e $SPECS_DIR/dssd_retrain_resnet18_kitti.txt \
                    -m $EXPERIMENT_DIR/experiment_dir_retrain/weights/dssd_resnet18_epoch_$EPOCH.tlt \
                   -l $EXPERIMENT_DIR/dssd_infer_labels \
                   -k $KEY

The `tao` inference tool produces two outputs. 
1. Overlain images in `$EXPERIMENT_DIR/dssd_infer_images`
2. Frame by frame bbox labels in kitti format located in `$EXPERIMENT_DIR/dssd_infer_labels`

In [None]:
# Simple grid visualizer
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=4, num_images=10):
    output_path = os.path.join(os.environ['EXPERIMENT_DIR'], image_dir)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 

In [None]:
# Visualizing the sample images.
OUTPUT_PATH = 'dssd_infer_images' # relative path from $EXPERIMENT_DIR.
COLS = 3 # number of columns in the visualizer grid.
IMAGES = 9 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)