# Hand detection using pretrained TLT DetectNet_v2

## Set up environment variables and map drives

When using the purpose-built pretrained models from NGC, please make sure to set the `$KEY` environment variable to the key as mentioned in the model overview. Failing to do so, can lead to errors when trying to load them as pretrained models.

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TLT experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/egohands`. 

*Note: Please make sure to remove any stray artifacts / files from the `$USER_EXPERIMENT_DIR` or `$DATA_DOWNLOAD_DIR` paths as mentioned below, that may have been generated from previous experiments. Having checkpoint files etc. may interfere with creating a training graph for a new experiment.*

*Note: This notebook currently is by default set up to run training using 1 GPU. To use more GPUs please update the env variable `$NUM_GPUS` accordingly*

In [None]:
# Setting up env variables for cleaner command line commands.
import os

%env KEY=tlt_encode
%env NUM_GPUS=1
%env USER_EXPERIMENT_DIR=/workspace/tlt-experiments/egohands
%env DATA_DOWNLOAD_DIR=/workspace/tlt-experiments/data

Set this path accordingly to your setup:

In [None]:
%env NOTEBOOK_ROOT=/home/USER_NAME/projects/WEBINAR_TLT_3.0_FINAL/training_tlt

In [None]:
!mkdir $NOTEBOOK_ROOT/data
!mkdir $NOTEBOOK_ROOT/egohands

In [None]:
# Define this local project directory that needs to be mapped to the TLT docker session.
# The dataset expected to be present in $LOCAL_PROJECT_DIR/data, while the results for the steps
# in this notebook will be stored at $LOCAL_PROJECT_DIR/egohands
# !PLEASE MAKE SURE TO UPDATE THIS PATH!.

os.environ["LOCAL_PROJECT_DIR"] = os.environ["NOTEBOOK_ROOT"]

os.environ["LOCAL_DATA_DIR"] = os.path.join(
    os.getenv("LOCAL_PROJECT_DIR", os.getcwd()),
    "data"
)
os.environ["LOCAL_EXPERIMENT_DIR"] = os.path.join(
    os.getenv("LOCAL_PROJECT_DIR", os.getcwd()),
    "egohands"
)

# The sample spec files are present in the same path as the downloaded samples.
os.environ["LOCAL_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)
%env SPECS_DIR=/workspace/tlt-experiments/egohands/specs

# Showing list of specification files.
!ls -rlt $LOCAL_SPECS_DIR

In [None]:
# Mapping up the local directories to the TLT docker.
import json
mounts_file = os.path.expanduser("~/.tlt_mounts.json")

# Define the dictionary with the mapped drives
drive_map = {
    "Mounts": [
        # Mapping the data directory
        {
            "source": os.environ["LOCAL_PROJECT_DIR"],
            "destination": "/workspace/tlt-experiments"
        },
        # Mapping the specs directory.
        {
            "source": os.environ["LOCAL_SPECS_DIR"],
            "destination": os.environ["SPECS_DIR"]
        },
    ]
}

# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(drive_map, mfile, indent=4)

In [None]:
!cat ~/.tlt_mounts.json

## Prepare dataset

Copy the `training` and `testing` directories of the EgoHands dataset converted into kitti format to `$LOCAL_DATA_DIR`.

You may use this notebook with your own dataset as well. To use this example with your own dataset, please follow the same directory structure as mentioned below:

* training images in `$LOCAL_DATA_DIR/training/images`
* training labels in `$LOCAL_DATA_DIR/training/labels`
* testing images in `$LOCAL_DATA_DIR/testing/images`

### Verify dataset

In [None]:
DATA_DIR = os.environ.get('LOCAL_DATA_DIR')
num_training_images = len(os.listdir(os.path.join(DATA_DIR, "training/images")))
num_training_labels = len(os.listdir(os.path.join(DATA_DIR, "training/labels")))
num_testing_images = len(os.listdir(os.path.join(DATA_DIR, "testing/images")))
print("Number of images in the train/val set. {}".format(num_training_images))
print("Number of labels in the train/val set. {}".format(num_training_labels))
print("Number of images in the test set. {}".format(num_testing_images))

In [None]:
# Sample label.
!cat $LOCAL_DATA_DIR/training/labels/CARDS_COURTYARD_B_T_frame_0113.txt

### Prepare tf records from EgoHands dataset in KITTI format

* Update the TfRecords spec file to take in your KITTI format dataset
* Create the TfRecords using the `detectnet_v2 dataset_convert`

*Note, TfRecords only need to be generated once.*

In [None]:
# Creating a new directory for the output tfrecords dump.
print("Converting Tfrecords for kitti trainval dataset")
!tlt detectnet_v2 dataset_convert \
                  -d $SPECS_DIR/egohands_tfrecords_kitti_trainval.txt \
                  -o $DATA_DOWNLOAD_DIR/tfrecords/kitti_trainval/kitti_trainval

In [None]:
!ls -rlt $LOCAL_DATA_DIR/tfrecords/kitti_trainval/

## Download pre-trained model

Download the correct pretrained model from the NGC model registry for your experiment. Please note that for DetectNet_v2, the input is expected to be 0-1 normalized with input channels in RGB order. Therefore, for optimum results please download model templates from `nvidia/tlt_pretrained_detectnet_v2`. The templates are now organized as version strings. For example, to download a resnet18 model suitable for detectnet please resolve to the NGC object shown as `nvidia/tlt_pretrained_detectnet_v2:resnet18`. 

All other models are in BGR order and expect input preprocessing with mean subtraction and input channels. Using them as pretrained weights may result in suboptimal performance.

You can also experiment with the following purpose-built pretrained models 
* [PeopleNet](https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet)
* [TrafficCamNet](https://ngc.nvidia.com/catalog/models/nvidia:tlt_trafficcamnet)
* [DashCamNet](https://ngc.nvidia.com/catalog/models/nvidia:tlt_dashcamnet)
* [FaceDetect-IR](https://ngc.nvidia.com/catalog/models/nvidia:tlt_facedetectir) 

### Installing NGC CLI on the local machine

In [None]:
## Download and install
%env CLI=ngccli_reg_linux.zip
!mkdir -p $LOCAL_PROJECT_DIR/ngccli

# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))

If you are running this for the first time, you must set up NGC keys. Follow set up instructions [here](https://ngc.nvidia.com/setup/api-key).

In [None]:
# List models available in the model registry.
!ngc registry model list nvidia/tlt_pretrained_detectnet_v2:*

Let's initialize our model with DetectNet v2 ResNet18.

In [None]:
# Create the target destination to download the model.
!mkdir -p $LOCAL_EXPERIMENT_DIR/pretrained_detectnet_v2/

In [None]:
# Download the pretrained model from NGC
!ngc registry model download-version nvidia/tlt_pretrained_detectnet_v2:resnet18 \
    --dest $LOCAL_EXPERIMENT_DIR/pretrained_detectnet_v2

In [None]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/pretrained_detectnet_v2

## Provide training specification

* TfRecords for the train datasets
    * To use the newly generated TfRecords, update the `dataset_config` parameter in the spec file at `$SPECS_DIR/egohands_train_resnet18_kitti.txt` (rename accordingly if using some other model).
    * Update the fold number to use for evaluation. In case of random data split, please use fold `0` only
    * For sequence-wise split, you may use any fold generated from the dataset convert tool
* Pre-trained models
* Augmentation parameters for on the fly data augmentation
* Other training (hyper-)parameters such as batch size, number of epochs, learning rate etc.

In [None]:
!cat $LOCAL_SPECS_DIR/egohands_train_resnet18_kitti.txt

## Run TLT training

Provide the sample spec file and the output directory location for models

*Note: The training may take hours to complete. Also, the remaining notebook, assumes that the training was done in single-GPU mode. When run in multi-GPU mode, please expect to update the pruning and inference steps with new pruning thresholds and updated parameters in the clusterfile.json accordingly for optimum performance.*

*Detectnet_v2 now supports restart from checkpoint. In case the training job is killed prematurely, you may resume training from the closest checkpoint by simply re-running the **same** command line. Please do make sure to use the <u>**same number of GPUs**</u> when restarting the training.*

*When running the training with NUM_GPUs>1, you may need to modify the `batch_size_per_gpu` and `learning_rate` to get similar mAP as a 1 GPU training run. In most cases, scaling down the batch-size by a factor of NUM_GPUs or scaling up the learning rate by a factor of NUM_GPUs would be a good place to start.* 

### DetectNet V2 initialized with ResNet18 weights in hdf5 format

In [None]:
!tlt detectnet_v2 train -e $SPECS_DIR/egohands_train_resnet18_kitti.txt \
                        -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
                        -k $KEY \
                        -n resnet18_detector \
                        --gpus $NUM_GPUS

In [None]:
print('Model for each epoch:')
print('---------------------')
!ls -lh $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/weights

In [None]:
!tlt detectnet_v2 evaluate -e $SPECS_DIR/egohands_train_resnet18_kitti.txt\
                           -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt \
                           -k $KEY

### DetectNet V2 initialized with purpose built PeopleNet pretrained in TLT

Let's try a different initialization for the matter of experimentation!

In [None]:
# List models available in the model registry.
!ngc registry model list nvidia/tlt_peoplenet:*

In [None]:
# Create the target destination to download the model.
!mkdir -p $LOCAL_EXPERIMENT_DIR/pretrained_peoplenet/

In [None]:
# Download the pretrained model from NGC
!ngc registry model download-version nvidia/tlt_peoplenet:unpruned_v2.1 \
    --dest $LOCAL_EXPERIMENT_DIR/pretrained_peoplenet

In [None]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/pretrained_peoplenet

In [None]:
!cat $LOCAL_SPECS_DIR/egohands_train_resnet34_kitti.txt

In [None]:
!tlt detectnet_v2 train -e $SPECS_DIR/egohands_train_resnet34_kitti.txt \
                        -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned_peoplenet \
                        -k $KEY \
                        -n resnet34_detector \
                        --gpus $NUM_GPUS

In [None]:
print('Model for each epoch:')
print('---------------------')
!ls -lh $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned_peoplenet/weights

In [None]:
!tlt detectnet_v2 evaluate -e $SPECS_DIR/egohands_train_resnet34_kitti.txt\
                           -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned_peoplenet/weights/resnet34_detector.tlt \
                           -k $KEY

This result is slightly better than our previous attemp. Let's continue with this model!

## Prune the trained model

Pruning parameters:

* pre-trained model
* equalization criterion (applicable for resnets and mobilenets)
* threshold for pruning
* a key to save and load the model
* output directory to store the model

*Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade off. Higher `pth` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold to use is dependent on the dataset. A pth value `5.2e-6` is just a start point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.*

*For some internal studies, we have noticed that a pth value of 0.01 is a good starting point for DetectNet_v2 models.*

Check the model size before pruning.

In [None]:
filepath = os.getenv("NOTEBOOK_ROOT", os.getcwd()) \
    + "/egohands/experiment_dir_unpruned_peoplenet/weights/resnet34_detector.tlt"

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1e+6))+" MB")

In [None]:
# Create an output directory if it doesn't exist.
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_pruned

In [None]:
!tlt detectnet_v2 prune \
                  -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned_peoplenet/weights/resnet34_detector.tlt \
                  -o $USER_EXPERIMENT_DIR/experiment_dir_pruned/resnet34_nopool_bn_detectnet_v2_pruned.tlt \
                  -eq union \
                  -pth 0.0000052 \
                  -k $KEY

Cehck the model size after pruning.

In [None]:
filepath = os.getenv("NOTEBOOK_ROOT", os.getcwd()) \
    + "/egohands/experiment_dir_pruned/resnet34_nopool_bn_detectnet_v2_pruned.tlt"

print ('{:,.0f}'.format(os.path.getsize(filepath)/float(1e+6))+" MB")

Usually after pruning model loses some of it's accuracy, therefore, model needs to be re-trained to bring back accuracy after pruning.

You sholuld create re-training specification with pretrained weights as pruned model.

*Note: For retraining, please set the `load_graph` option to `true` in the model_config to load the pruned model graph. Also, if after retraining, the model shows some decrease in mAP, it could be that the originally trained model was pruned a little too much. Please try reducing the pruning threshold (thereby reducing the pruning ratio) and use the new model to retrain.*

Furthermore, DetectNet_v2 now supports quantization aware training to optmize the model even more. This step is usually performed during retraining after pruning, so, let's combine them in this example, too.

## Retraining after pruning with quantization aware training (QAT)

### Convert pruned model to QAT and retrain 

All DetectNet models, unpruned and pruned models can be converted to QAT models by setting the `enable_qat` parameter in the `training_config` component of the spec file to `true`.

In [None]:
# Printing the retrain experiment file. 
# Note: We have updated the experiment file to convert the
# pretrained model to qat mode by setting the enable_qat
# parameter.
!cat $LOCAL_SPECS_DIR/egohands_retrain_resnet34_kitti_qat.txt

In [None]:
!tlt detectnet_v2 train -e $SPECS_DIR/egohands_retrain_resnet34_kitti_qat.txt \
                        -r $USER_EXPERIMENT_DIR/experiment_dir_retrain_qat \
                        -k $KEY \
                        -n resnet34_detector_pruned_qat \
                        --gpus $NUM_GPUS

In [None]:
!ls -rlt $LOCAL_EXPERIMENT_DIR/experiment_dir_retrain_qat/weights

### Evaluate QAT converted model 


This section evaluates a QAT enabled pruned retrained model. The mAP of this model should be comparable to that of the pruned retrained model without QAT. However, due to quantization, it is possible sometimes to see a drop in the mAP value for certain datasets.

In [None]:
!tlt detectnet_v2 evaluate -e $SPECS_DIR/egohands_retrain_resnet34_kitti_qat.txt \
                           -m $USER_EXPERIMENT_DIR/experiment_dir_retrain_qat/weights/resnet34_detector_pruned_qat.tlt \
                           -k $KEY \
                           -f tlt

### Export QAT trained model to int8 

Export a QAT trained model to TensorRT parsable model. This command generates an `.etlt` file from the trained model and the serializes corresponding INT8 scales as a TRT readable calibration cache file.

*Note: With this comand we are showing you how to convert the model to TensorRT engine file, however, to deploy the model on Jetson, another engine specific for Jetson has to be created on Jetson (we will cover it in the Deployment part of our tutorial).*

In [None]:
!rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet34_detector_qat.etlt
!rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/calibration_qat.bin
!tlt detectnet_v2 export \
                  -m $USER_EXPERIMENT_DIR/experiment_dir_retrain_qat/weights/resnet34_detector_pruned_qat.tlt \
                  -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet34_detector_qat.etlt \
                  -k $KEY  \
                  --data_type int8 \
                  --batch_size 64 \
                  --max_batch_size 64\
                  --engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet34_detector_qat.trt.int8 \
                  --cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration_qat.bin \
                  --verbose

### Evaluate a QAT trained model using the exported TensorRT engine

This section evaluates a QAT enabled pruned retrained model using the TensorRT INT8 engine that was exported in the previous step. Please note that there maybe a slight difference (~0.1-0.5%) in the mAP, oweing to some differences in the implementation of quantization in TensorRT.

*Note: The TensorRT evaluator might be slightly slower than the TLT evaluator here, because the evaluation dataloader is pinned to the CPU to avoid any clashes between TensorRT and TLT instances in the GPU. Please note that this tool was not intended and has not been developed for profiling the model. It is just a means to qualitatively analyse the model. Please use native TensorRT or DeepStream for the most optimized inferences.*

In [None]:
!tlt detectnet_v2 evaluate -e $SPECS_DIR/egohands_retrain_resnet34_kitti_qat.txt \
                           -m $USER_EXPERIMENT_DIR/experiment_dir_final/resnet34_detector_qat.trt.int8 \
                           -f tensorrt

### Inference using QAT engine 

Run inference and visualize detections on test images, using the exported TensorRT engine. Note that we are using an extra config file `egohands_inference_kitti_etlt_qat.txt` for that.

In [None]:
!tlt detectnet_v2 inference -e $SPECS_DIR/egohands_inference_kitti_etlt_qat.txt \
                            -o $USER_EXPERIMENT_DIR/tlt_infer_testing_qat \
                            -i $DATA_DOWNLOAD_DIR/testing/images \
                            -k $KEY

### Visualize some examples after QAT

In [None]:
# Simple grid visualizer
!pip3 install matplotlib==3.3.3
%matplotlib inline
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=4, num_images=10):
    output_path = os.path.join(os.environ['LOCAL_EXPERIMENT_DIR'], image_dir)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 

In [None]:
# visualize the first 12 inferenced images.
OUTPUT_PATH = 'tlt_infer_testing_qat/images_annotated' # relative path from $USER_EXPERIMENT_DIR.
COLS = 4 # number of columns in the visualizer grid.
IMAGES = 12 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

## Deploying on Jetson

At that point you are ready to start the deployment part. You will need the following outputs from this experiment to proceed:

* `experiment_dir_final/calibration_qat.bin`
* `experiment_dir_final/resnet34_detector_qat.etlt`

Copy these files over to your Jetson and consult the `README.md` document for further instructions.