# Real-time object detection for disaster response using Transfer Learning Toolkit

James Skinner, jskinner@nvidia.com

## Setup
Sign up for a free NVIDIA GPU Cloud (NGC) account at [ngc.nvidia.com](https://ngc.nvidia.com), then pull the Transfer Learning Toolkit (TLT) container from https://ngc.nvidia.com/catalog/containers/nvidia:tlt-streamanalytics

Pull and enter the container

    DATA_DIR=/path/to/your/data
    WORKING_DIR=/path/to/workingdir # include the "specs" and "deepstream" directories
    docker pull nvcr.io/nvidia/tlt-streamanalytics:v1.0_py2
    docker run --runtime=nvidia -it -v $DATA_DIR:/data \ 
        -v $WORKING_DIR:/src -p 8888:8888 \ 
            nvcr.io/nvidia/tlt-streamanalytics:v1.0_py2 /bin/bash

Configure TLT to use your NGC API key

    ngc config set

Set you NGC API key and some local directories

In [None]:
KEY='<YOUR API KEY>'

#Where we will save our data 
TLT_DIR='/data/tlt_working_dir'

#Where out data is stored
DATA_DIR='/data/stanford/kitti2'

## Download pre-trained model
View models available on NGC.

NB: If this doesn't work, did you run `ngc config set`?

In [None]:
!ngc registry model list *detectnet*

Download your chosen model

In [None]:
!ngc registry model download-version nvidia/iva/tlt_resnet50_detectnet_v2:1 -d $TLT_DIR

Let's examine the contents of that folder, to check that a `.hdf5` file has been downloaded.

In [None]:
!ls $TLT_DIR/tlt_resnet50_detectnet_v2_v1

## Review data
We are using the [Stanford Drones Dataset](http://cvgl.stanford.edu/projects/uav_data/). 

An extract from the raw data annotations.

In [None]:
!head /data/stanford/annotations/coupa/video0/annotations.txt

Each video is of a different resolution and duration.

In [None]:
%%bash
ffmpeg -i /data/stanford/videos/coupa/video0/video.mov 2>&1 | grep Video: | grep -Po '\d{3,5}x\d{3,5}'
ffprobe -i /data/stanford/videos/coupa/video0/video.mov 2>&1 -show_format | grep duration

I have pre-processed this by :
1. Saving out every frame from each video, for example:

        ffmpeg -i videos/bookstore/video0/video.mov -qscale:v 2 raw_jpgs/bookstore/bookstore_video0_%06d.jpg

2. Randomly selecting some frames from each video

        selected_frames = random.sample(frameslist, n_frames_per_vid)

3. Randomly cropping each frame to 768 x 768.

        im = Image.open(framepath)
        width, height = im.size
        # Select random crop
        crop_xmin = random.randint(0, width - crop_w)
        crop_ymin = random.randint(0, height - crop_h)
        crop_xmax = crop_xmin + crop_w
        crop_ymax = crop_ymin + crop_h


4. Saving the annotations out in [KITTI format](https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#kitti_file)

        kikki_output_list = [label, truncated, occluded, alpha, xmin, ymin, xmax, ymax, height_metres,
                             width_metres, length_metres, cam_x, cam_y, cam_z, rot]

An example of a KITTI annotation file:

In [None]:
!head $DATA_DIR/labels/coupa_video0_000047.txt 

An example image

In [None]:
from IPython.display import display
from PIL import Image
import os

def display_image(img_path):
    pil_im = Image.open(img_path) #Take jpg + png
    im_array = np.asarray(pil_im)
    plt.imshow(im_array)
    plt.show()

path = os.path.join(DATA_DIR, 'images/coupa_video0_000047.jpg')
display(Image.open(path))

## Convert dataset to TFRecords
We convert our dataset into TFRecords using the `tlt-dataset-convert` command.

We use a spec file to describe the dataset: [convert.txt](specs/convert.txt)

**Change needed**: Be sure to update `root_directory_path` and `image_directory_path` to the location of your data.

In [None]:
!more specs/convert.txt

Now we make out TFRecords, creating the 15% validation split, as specified in convert.txt.

In [None]:
!tlt-dataset-convert -d specs/convert.txt -o $TLT_DIR/tfrecords

## Train
We use a spec file to control the training process: [train.txt](specs/train.txt)

In [None]:
# If necessary, first make a directory in which to save your trained model.
!mkdir $TLT_DIR/trained

In [None]:
!tlt-train detectnet_v2 -e specs/train.txt \
        -r $TLT_DIR/trained --gpus 8 -k KEY

You should now find that your `trained` directory contains a number of models, named `model.step-xxx.tlt`, where for me `xxx` = 133080

In [None]:
ls $TLT_DIR/trained

As we are going to use this model in several commands, let's save it as a variable.

In [None]:
MODEL=os.path.join(TLT_DIR, 'trained', 'model.step-133080.tlt')

## Evaluate

In [None]:
!tlt-evaluate detectnet_v2 -e specs/train.txt -m $MODEL -k $KEY

I achieved the following accuracy:

    class name      average precision (in %)
    ------------  --------------------------
    person                           43.9481
    vehicle                          72.5628

## Infer
We use a spec file to control the inference process: [infer.txt](specs/infer.txt)

In [None]:
INFER_DIR='/data/stanford/kitti2/infer_imgs'
OUTPUT_DIR='~/inferred_images'
!tlt-infer detectnet_v2 -m $MODEL -i $INFER_DIR -o $OUTPUT_DIR -k $KEY -bs 16 -cp specs/infer.txt 

## Prune & retrain
In the webinar, I didn't prune or re-train my model due to time constraints. The [Getting Started Guide](https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#pruning_models) contains information about the various pruning options.

### Prune

In [None]:
!mkdir $TLT_DIR/pruned
!tlt-prune -pm $MODEL -o $TLT_DIR/pruned -pth 0.30 -nf 16 -k $KEY

Now we trained the pruned model

### Re-train

We use a new spec file to control the training process: [retrain.txt](specs/retrain.txt)

Make sure you change the `pretrained_model_file` to the model produced by the pruning process:

In [None]:
!ls $TLT_DIR/pruned/*

Now we can re-train

In [None]:
!mkdir $TLT_DIR/retrained
!tlt-train detectnet_v2 -e specs/retrain.txt \
        -r $TLT_DIR/retrained --gpus 8 -k $KEY

As before, we need to establish the best trained model and save that to a convenient variable

In [None]:
!ls $TLT_DIR/retrained

In [None]:
MODEL=os.path.join(TLT_DIR, 'retrained', 'model.step-133080.tlt')

## Export

You can export in any of FP32, FP16 or INT8 precision.

### FP32

### FP16

    tlt-export $TLT_DIR/trained/model.step-133080.tlt -k $API_KEY \
        --export_module detectnet_v2 --outputs output_bbox/BiasAdd,output_cov/Sigmoid \
        --data_type fp16 --output_file $TLT_DIR/exports/FP16_model.etlt

### INT8

First generate the INT8 calibration file

In [None]:
#If necessary, make an export directory
!mkdir $TLT_DIR/exports

### Export with FP32 precision

In [None]:
!tlt-export $MODEL -k $KEY \
    --export_module detectnet_v2 --outputs output_bbox/BiasAdd,output_cov/Sigmoid \
    --data_type fp32 --output_file $TLT_DIR/exports/FP32_model.etlt

### Export with FP16 precision

In [None]:
!tlt-export $MODEL -k $KEY \
    --export_module detectnet_v2 --outputs output_bbox/BiasAdd,output_cov/Sigmoid \
    --data_type fp16 --output_file $TLT_DIR/exports/FP16_model.etlt

### Export with INT8 precision

First we generate an INT8 calibration file

In [None]:
!tlt-int8-tensorfile detectnet_v2 -e specs/train.txt \
        -o $TLT_DIR/exports/calibration.tensor -m 20

Now we export the model

In [None]:
!tlt-export $MODEL -k $KEY --export_module detectnet_v2 --outputs output_bbox/BiasAdd,output_cov/Sigmoid --data_type int8  --output_file $TLT_DIR/exports/INT8_model.etlt  --cal_data_file $TLT_DIR/exports/calibration.tensor --cal_cache_file $TLT_DIR/exports/calibration.bin --input_dims 3,768,768

## Review DeepStream config files

The `.etlt` files above can be run directly in DeepStream using the `tlt-encoded-model` and `tlt-model-key` parameters.

We are going to convert our `.etlt` models in TensorRT engines first, then use the config files below to run the model.

* [labels.txt](deepstream/labels.txt)
* [primary_inference.txt](deepstream/primary_inference.txt)
    * This is looking for a file called `INT8_m1.plan`. We need to build this file on the Jetson device.
* [stream_config.txt](deepstream/stream_config.txt)
    * You need to replace `/path/to/your/mp4/video` with the path to your input video.

## Copy files to edge device

Now we move to the Jetson AGX Xavier to run our inference.

    scp $TLT_DIR/exports/* <Jetson IP>:~/tlt
    
We also need to copy the DeepStream config files (discussed above) to a directory on the Jetson.

    scp -r deepstream/* <Jetson IP>:~/tlt/ds_configs
    
## Run DeepStream (on Jetson)

1. Download `tlt-converter` from developer.nvidia.com/transfer-learning-toolkit
2. Convert your model to a TensorRT Engine. This creates the `INT8_m1.plan` file discussed above.

        ./tlt-converter -k $KEY -d 3,768,768 \
        -o output_bbox/BiasAdd,output_cov/Sigmoid \
        -e ~/tlt/ds_configs/INT8_m1.plan \
        -t int8 \
        -c ~/tlt/calibration.bin \
        -m 1 \
        ~/tlt/INT8_model.etlt

3. Change to the DeepStream samples directory

        cd /opt/nvidia/deepstream/deepstream-4.0/samples
    
4. Run the stream!

        deepstream-app -c ~/tlt/ds_configs/stream_config.txt

# References

**Developer zone**, to download `tlt-converter`: [developer.nvidia.com/transfer-learning-toolkit](https://developer.nvidia.com/transfer-learning-toolkit)

**TLT getting started guide**: [docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html](https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide)

**DeepStream webinar**: [info.nvidia.com/deepstream-to-improve-video-analytics-reg-page.html](https://info.nvidia.com/deepstream-to-improve-video-analytics-reg-page.html?ncid=so-lin-d2-97653&ondemandrgt=yes)