## Get the TensorRT tar file before running this Notebook

1. Visit https://developer.nvidia.com/tensorrt
2. Clicking `Download now` from step one directs you to https://developer.nvidia.com/nvidia-tensorrt-download where you have to Login/Join Now for Nvidia Developer Program Membership
3. Now, in the download page: Choose TensorRT 8 in available versions
4. Agree to Terms and Conditions
5. Click on TensorRT 8.6 GA to expand the available options
6. Click on 'TensorRT 8.6 GA for Linux x86_64 and CUDA 12.0 and 12.1 TAR Package' to dowload the TAR file
7. Upload the the tar file to your Google Drive

## Connect to GPU Instance

1. Change Runtime type to GPU by Runtime(Top Left tab)->Change Runtime Type->GPU(Hardware Accelerator)
1. Then click on Connect (Top Right)

## Mounting Google drive
Mount your Google drive storage to this Colab instance

In [5]:
import sys
if 'google.colab' in sys.modules:
    %env GOOGLE_COLAB=1
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
else:
    %env GOOGLE_COLAB=0
    print("Warning: Not a Colab Environment")

env: GOOGLE_COLAB=1
Mounted at /content/drive


# Object Detection using TAO YOLOv4 Tiny

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used train technique where you use a model trained on one task and re-train to use it on a different task.

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://developer.nvidia.com/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png" width="1080">


## Learning Objectives
In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Take a pretrained model and train a YOLO v4 Tiny model on the KITTI dataset
* Prune the trained YOLO v4 Tiny model
* Retrain the pruned model to recover lost accuracy
* Export the pruned model
* Quantize the pruned model using QAT
* Run Inference on the trained model
* Export the pruned, quantized and retrained model to a .etlt file for deployment to DeepStream

## Table of Contents

This notebook shows an example use case of YOLO v4 Tiny object detection using Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables](#head-0)
1. [Prepare dataset and pre-trained model](#head-1) <br>
     1.1 [Download the dataset](#head-1-1)<br>
     1.2 [Verify the downloaded dataset](#head-1-2)<br>
     1.3 [Download pretrained model](#head-1-3)
2. [Setup GPU environment](#head-2) <br>
    2.1 [Setup Python environment](#head-2-1) <br>
3. [Generate tfrecords](#head-3)<br>
4. [Provide train specification](#head-4)
5. [Run TAO train](#head-5)
6. [Evaluate trained models](#head-6)
7. [Prune trained models](#head-7)
8. [Retrain pruned models](#head-8)
9. [Evaluate retrained model](#head-9)
10. [Visualize inferences](#head-10)


#### Note
1. This notebook currently is by default set up to run train using 1 GPU. To use more GPU's please update the env variable `$NUM_GPUS` accordingly
1. This notebook uses KITTI dataset by default, which should be around ~12 GB. If you are limited by Google-Drive storage, we recommend to:

    i. Download the dataset onto the local system

    ii. Run the utility script at $COLAB_NOTEBOOKS/tensorflow/utils/generate_kitti_subset.py in your local system

    iii. This generates a subset of coco dataset with number of sample images you wish for

    iv. Upload this subset onto Google Drive

1. Using the default config/spec file provided in this notebook, each weight file size of yolo_v4_tiny created during train will be ~68 MB

## 0. Set up env variables and set FIXME parameters <a class="anchor" id="head-0"></a>

#### FIXME
1. NUM_GPUS - set this to <= number of GPU's availble on the instance
1. COLAB_NOTEBOOKS_PATH - for Google Colab environment, set this path where you want to clone the repo to; for local system environment, set this path to the already cloned repo
1. EXPERIMENT_DIR - set this path to a folder location where pretrained models, checkpoints and log files during different model actions will be saved
1. delete_existing_experiments - set to True to remove existing pretrained models, checkpoints and log files of a previous experiment
1. DATA_DIR - set this path to a folder location where you want to dataset to be present
1. delete_existing_data - set this to True to remove existing preprocessed and original data
1. trt_tar_path - set this path of the uploaded TensorRT tar.gz file after browser download
1. trt_untar_folder_path - set to path of the folder where the TensoRT tar.gz file has to be untarred into
1. trt_version - set this to the version of TRT you have downloaded

In [6]:
# Setting up env variables for cleaner command line commands.
import os

%env TAO_DOCKER_DISABLE=1

%env KEY=nvidia_tlt
#FIXME1
%env NUM_GPUS=1

#FIXME2
%env COLAB_NOTEBOOKS_PATH=/content/drive/MyDrive/nvidia-tao
if os.environ["GOOGLE_COLAB"] == "1":
    if not os.path.exists(os.path.join(os.environ["COLAB_NOTEBOOKS_PATH"])):

      !git clone https://github.com/NVIDIA-AI-IOT/nvidia-tao.git $COLAB_NOTEBOOKS_PATH
else:
    if not os.path.exists(os.environ["COLAB_NOTEBOOKS_PATH"]):
        raise Exception("Error, enter the path of the colab notebooks repo correctly")

#FIXME3
%env EXPERIMENT_DIR=/content/drive/MyDrive/results/yolo_v4_tiny
#FIXME4
delete_existing_experiments = True
#FIXME5
%env DATA_DIR=/content/drive/MyDrive/kitti_data/
#FIXME6
delete_existing_data = False

if delete_existing_experiments:
    !sudo rm -rf $EXPERIMENT_DIR
if delete_existing_data:
    !sudo rm -rf $DATA_DIR

SPECS_DIR=f"{os.environ['COLAB_NOTEBOOKS_PATH']}/tensorflow/yolo_v4_tiny/specs"
%env SPECS_DIR={SPECS_DIR}
# Showing list of specification files.
!ls -rlt $SPECS_DIR

!sudo mkdir -p $DATA_DIR && sudo chmod -R 777 $DATA_DIR
!sudo mkdir -p $EXPERIMENT_DIR && sudo chmod -R 777 $EXPERIMENT_DIR

env: TAO_DOCKER_DISABLE=1
env: KEY=nvidia_tlt
env: NUM_GPUS=1
env: COLAB_NOTEBOOKS_PATH=/content/drive/MyDrive/nvidia-tao
env: EXPERIMENT_DIR=/content/drive/MyDrive/results/yolo_v4_tiny
env: DATA_DIR=/content/drive/MyDrive/kitti_data/
env: SPECS_DIR=/content/drive/MyDrive/nvidia-tao/tensorflow/yolo_v4_tiny/specs
total 5
-rw------- 1 root root 2040 Apr  1 04:36 yolo_v4_tiny_train_kitti.txt
-rw------- 1 root root  262 Apr  1 04:36 yolo_v4_tiny_tfrecords_kitti_val.txt
-rw------- 1 root root  266 Apr  1 04:36 yolo_v4_tiny_tfrecords_kitti_train.txt
-rw------- 1 root root 2006 Apr  1 04:36 yolo_v4_tiny_retrain_kitti.txt


## 1. Prepare dataset and pre-trained model <a class="anchor" id="head-1"></a>

We will be using NVIDIA created Synthetic Object detection data based on KITTI dataset format in this notebook. To find more details about kitti format, please visit [here](https://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d).

**If using custom dataset; it should follow this dataset structure**
```
$DATA_DIR/train
├── images
│   ├── image_name_1.jpg
│   ├── image_name_2.jpg
|   ├── ...
└── labels
    ├── image_name_1.txt
    ├── image_name_2.txt
    ├── ...
$DATA_DIR/val
├── images
│   ├── image_name_5.jpg
│   ├── image_name_6.jpg
|   ├── ...
└── labels
    ├── image_name_5.txt
    ├── image_name_6.txt
    ├── ...
```
The file name should be same for images and labels folders

### 1.1 Download the dataset <a class="anchor" id="head-1-1"></a>

In [14]:
!python3 -m pip install awscli
!aws s3 cp --no-sign-request s3://tao-object-detection-synthetic-dataset/tao_od_synthetic_train.tar.gz $DATA_DIR/
!aws s3 cp --no-sign-request s3://tao-object-detection-synthetic-dataset/tao_od_synthetic_val.tar.gz $DATA_DIR/

!mkdir -p $DATA_DIR/train/ && rm -rf $DATA_DIR/train/*
!mkdir -p $DATA_DIR/val/ && rm -rf $DATA_DIR/val/*

!tar -xzf $DATA_DIR/tao_od_synthetic_train.tar.gz -C $DATA_DIR/train/
!tar -xzf $DATA_DIR/tao_od_synthetic_val.tar.gz -C $DATA_DIR/val/

Collecting awscli
  Using cached awscli-1.38.24-py3-none-any.whl.metadata (11 kB)
Collecting botocore==1.37.24 (from awscli)
  Using cached botocore-1.37.24-py3-none-any.whl.metadata (5.7 kB)
Collecting docutils<0.17,>=0.10 (from awscli)
  Using cached docutils-0.16-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting s3transfer<0.12.0,>=0.11.0 (from awscli)
  Using cached s3transfer-0.11.4-py3-none-any.whl.metadata (1.7 kB)
Collecting colorama<0.4.7,>=0.2.5 (from awscli)
  Using cached colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Collecting rsa<4.8,>=3.1.2 (from awscli)
  Using cached rsa-4.7.2-py3-none-any.whl.metadata (3.6 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from botocore==1.37.24->awscli)
  Using cached jmespath-1.0.1-py3-none-any.whl.metadata (7.6 kB)
Collecting pyasn1>=0.1.3 (from rsa<4.8,>=3.1.2->awscli)
  Downloading pyasn1-0.6.1-py3-none-any.whl.metadata (8.4 kB)
Using cached awscli-1.38.24-py3-none-any.whl (4.7 MB)
Downloading botocore-1.37.24-py3-none-any.whl (13.5 M

### 1.3 Download pre-trained model <a class="anchor" id="head-1-3"></a>

We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [15]:
# Installing NGC CLI on the local machine.
## Download and install
%env LOCAL_PROJECT_DIR=/ngc_content/
%env CLI=ngccli_cat_linux.zip
!sudo mkdir -p $LOCAL_PROJECT_DIR/ngccli && sudo chmod -R 777 $LOCAL_PROJECT_DIR

# Remove any previously existing CLI installations
!sudo rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget --content-disposition 'https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.23.0/files/ngccli_linux.zip' -P $LOCAL_PROJECT_DIR/ngccli -O $LOCAL_PROJECT_DIR/ngccli/$CLI
!unzip -u -q "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))
!cp /usr/lib/x86_64-linux-gnu/libstdc++.so.6 $LOCAL_PROJECT_DIR/ngccli/ngc-cli/libstdc++.so.6

env: LOCAL_PROJECT_DIR=/ngc_content/
env: CLI=ngccli_cat_linux.zip
--2025-04-01 05:46:06--  https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.23.0/files/ngccli_linux.zip
Resolving api.ngc.nvidia.com (api.ngc.nvidia.com)... 35.83.233.203, 52.33.153.12
Connecting to api.ngc.nvidia.com (api.ngc.nvidia.com)|35.83.233.203|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://xfiles.ngc.nvidia.com/org/nvidia/team/ngc-apps/recipes/ngc_cli/versions/3.23.0/files/ngccli_linux.zip?Signature=Bakny3rEPhux24LfZQkNEH5-gCMGRr4YHrEr6~viWiUvBq-MR6lfR1BXS48PBi4FfI2D1NfQlWScZwqS4lckVzkI94zAadYly66QZJDEGysM3xDeH30B1nw6kdY~G9R68mugHRrrXrWDnZuA2k82KomZeg3PoOHpfYSWt1tSTc-c3n1VJK2AgCitmVDWb74CeSjK0Yto~l5tj1tvcC9F6Zya9c8GRkCya7AX6DMLzNgm93JAETwvIYU~aRAEJ6L7GSAwoWzbq94KbsnIUzudIf4TAcwRMzuPZ6Ra~pR2lSkaLE0nefy0Kjbpo68iR5GrHt694x3t2pgDvfKRpSrFJQ__&Expires=1743572767&Key-Pair-Id=KCX06E8E9L60W [following]
--2025-04-01 05:46:07--  https://xfiles.ngc.nvidia.com/

In [16]:
!ngc registry model list nvstaging/tao/pretrained_object_detection:*

CLI_VERSION: Latest - 3.63.0 available (current: 3.23.0). Please update by using the command 'ngc version upgrade' 

+---------+----------+--------+-------+-------+----------+------+--------+---------+
| Version | Accuracy | Epochs | Batch | GPU   | Memory F | File | Status | Created |
|         |          |        | Size  | Model | ootprint | Size |        | Date    |
+---------+----------+--------+-------+-------+----------+------+--------+---------+
+---------+----------+--------+-------+-------+----------+------+--------+---------+


In [17]:
!mkdir -p $EXPERIMENT_DIR/pretrained_cspdarknet_tiny

In [18]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tao/pretrained_object_detection:cspdarknet_tiny \
                   --dest $EXPERIMENT_DIR/pretrained_cspdarknet_tiny

[2KGetting files to download...
[?25l[32m⠋[0m [36m━━━━━━━[0m • [32m0.0/28…[0m • [36mRemaining:[0m [36m-:--:--[0m • [31m?[0m • [33mElapsed:[0m [33m0:00:…[0m • [34mTotal: 1 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠙[0m [36m━━━━━━━[0m • [32m0.0/28…[0m • [36mRemaining:[0m [36m-:--:--[0m • [31m?[0m • [33mElapsed:[0m [33m0:00:…[0m • [34mTotal: 1 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠹[0m [36m━━━━━━━[0m • [32m0.0/28…[0m • [36mRemaining:[0m [36m-:--:--[0m • [31m?[0m • [33mElapsed:[0m [33m0:00:…[0m • [34mTotal: 1 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠼[0m [36m━━━━━━━[0m • [32m0.0/28…[0m • [36mRemaining:[0m [36m-:--:--[0m • [31m?[0m • [33mElapsed:[0m [33m0:00:…[0m • [34mTotal: 1 - Completed: 0 - Failed: 0[0m
[2K[1A[2K[32m⠼[0m [36m━━━━━━━[0m • [32m0.0/28…[0m • [36mRemaining:[0m [36m-:--:--[0m • [31m?[0m • [33mElapsed:[0m [33m0:00:…[0m • [34mTotal: 1 - Completed: 1 - Failed: 0[0m
  

In [19]:
print("Check that model is downloaded into dir.")
!ls -l $EXPERIMENT_DIR/pretrained_cspdarknet_tiny/pretrained_object_detection_vcspdarknet_tiny

Check that model is downloaded into dir.
total 1
-rw------- 1 root root 110 Apr  1 05:46 cspdarknet_tiny.hdf5


## 2. Setup GPU environment <a class="anchor" id="head-2"></a>


### 2.1 Setup Python environment <a class="anchor" id="head-2-1"></a>
Setup the environment necessary to run the TAO Networks by running the bash script

In [8]:
# FIXME 7: set this path of the uploaded TensorRT tar.gz file after browser download
trt_tar_path="/content/drive/MyDrive/TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-12.5.tar.gz"

import os
if not os.path.exists(trt_tar_path):
  raise Exception("TAR file not found in the provided path")

# FIXME 8: set to path of the folder where the TensoRT tar.gz file has to be untarred into
%env trt_untar_folder_path=/content/trt_untar
# FIXME 9: set this to the version of TRT you have downloaded
%env trt_version=10.0.3.26

!sudo mkdir -p $trt_untar_folder_path && sudo chmod -R 777 $trt_untar_folder_path/

import os

untar = True
for fname in os.listdir(os.environ.get("trt_untar_folder_path", None)):
  if fname.startswith("TensorRT-"+os.environ.get("trt_version")) and not fname.endswith(".tar.gz"):
    untar = False

if untar:
  !tar -xzf $trt_tar_path -C /content/trt_untar

if os.environ.get("LD_LIBRARY_PATH","") == "":
  os.environ["LD_LIBRARY_PATH"] = ""
trt_lib_path = f':{os.environ.get("trt_untar_folder_path")}/TensorRT-{os.environ.get("trt_version")}/lib'
os.environ["LD_LIBRARY_PATH"]+=trt_lib_path

env: trt_untar_folder_path=/content/trt_untar
env: trt_version=10.0.3.26


In [9]:
import os
if os.environ["GOOGLE_COLAB"] == "1":
    os.environ["bash_script"] = "setup_env.sh"
else:
    os.environ["bash_script"] = "setup_env_desktop.sh"

os.environ["NV_TAO_TF_TOP"] = "/tmp/tao_tensorflow1_backend/"

!sed -i "s|PATH_TO_TRT|$trt_untar_folder_path|g" $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script
!sed -i "s|TRT_VERSION|$trt_version|g" $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script
!sed -i "s|PATH_TO_COLAB_NOTEBOOKS|$COLAB_NOTEBOOKS_PATH|g" $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script

!sh $COLAB_NOTEBOOKS_PATH/tensorflow/$bash_script

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
creating build/bdist.linux-x86_64/wheel/nvidia_tao_tf1/cv/common/entrypoint
copying build/lib/nvidia_tao_tf1/cv/common/entrypoint/entrypoint.py -> build/bdist.linux-x86_64/wheel/./nvidia_tao_tf1/cv/common/entrypoint
copying build/lib/nvidia_tao_tf1/cv/common/entrypoint/__init__.py -> build/bdist.linux-x86_64/wheel/./nvidia_tao_tf1/cv/common/entrypoint
creating build/bdist.linux-x86_64/wheel/nvidia_tao_tf1/cv/common/evaluator
copying build/lib/nvidia_tao_tf1/cv/common/evaluator/ap_evaluator.py -> build/bdist.linux-x86_64/wheel/./nvidia_tao_tf1/cv/common/evaluator
copying build/lib/nvidia_tao_tf1/cv/common/evaluator/__init__.py -> build/bdist.linux-x86_64/wheel/./nvidia_tao_tf1/cv/common/evaluator
creating build/bdist.linux-x86_64/wheel/nvidia_tao_tf1/cv/mask_rcnn
creating build/bdist.linux-x86_64/wheel/nvidia_tao_tf1/cv/mask_rcnn/hyperparameters
copying build/lib/nvidia_tao_tf1/cv/mask_rcnn/hyperparameters/params_io.py -> 

## 3. Generate tfrecords <a class="anchor" id="head-3"></a>

The default YOLOv4 Tiny data format requires generation of TFRecords. Currently, the old sequence data format (image folders and label txt folders) is still supported and if you prefer to use the sequence data format, you can skip this section. To use sequence data format, please use spec file `yolo_v4_tiny_train_kitti_seq.txt` and `yolo_v4_tiny_retrain_kitti_seq.txt`. And you can check our user guide for more details about tfrecords generation and sequence data format usage.

Note: we observe that for YOLOv4 Tiny, when mosaic augmentation is turned on (mosaic_prob > 0), the sequence format has faster train speed.

Note: we observe the TFRecords format sometimes results in CUDA error during evaluation. Setting `force_on_cpu` in `nms_config` to `true` can help prevent this problem.

In [20]:
!sed -i "s|TAO_DATA_PATH|$DATA_DIR/|g" $SPECS_DIR/yolo_v4_tiny_tfrecords_kitti_train.txt
!sed -i "s|TAO_DATA_PATH|$DATA_DIR/|g" $SPECS_DIR/yolo_v4_tiny_tfrecords_kitti_val.txt

In [21]:
!tao model yolo_v4_tiny dataset_convert -d $SPECS_DIR/yolo_v4_tiny_tfrecords_kitti_train.txt \
                             -o $DATA_DIR/train/tfrecords/train

Using TensorFlow backend.
2025-04-01 05:46:50.720696: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
Traceback (most recent call last):
  File "/usr/local/bin/yolo_v4_tiny", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/entrypoint/yolo_v4.py", line 12, in main
    launch_job(nvidia_tao_tf1.cv.yolo_v4.scripts, "yolo_v4", sys.argv[1:])
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/entrypoint/entrypoint.py", line 276, in launch_job
    modules = get_modules(package)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/entrypoint/entrypoint.py", line 47, in get_modules
    module = importlib.import_module(module_name)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in 

In [None]:
!tao model yolo_v4_tiny dataset_convert -d $SPECS_DIR/yolo_v4_tiny_tfrecords_kitti_val.txt \
                             -o $DATA_DIR/val/tfrecords/val

In [None]:
# If you use your own dataset, you will need to run the code below to generate the best anchor shape

# !tao model yolo_v4_tiny kmeans -l $DOWNLOAD_DIR/train/labels \
#                          -i $DOWNLOAD_DIR/train/images \
#                          -n 6 \
#                          -x 1280 \
#                          -y 720

# The anchor shape generated by this script is sorted. Write the first 3 into small_anchor_shape in the config
# file. Write middle 3 into mid_anchor_shape. Write last 3 into big_anchor_shape.

## 4. Provide train specification <a class="anchor" id="head-4"></a>
* Augmentation parameters for on-the-fly data augmentation
* Other train (hyper-)parameters such as batch size, number of epochs, learning rate etc.
* Whether to use quantization aware train (QAT)

In [None]:
# Provide pretrained model path
!sed -i "s|TAO_DATA_PATH|$DATA_DIR/|g" $SPECS_DIR/yolo_v4_tiny_train_kitti.txt
!sed -i "s|EXPERIMENT_DIR_PATH|$EXPERIMENT_DIR/|g" $SPECS_DIR/yolo_v4_tiny_train_kitti.txt

# To enable QAT train on sample spec file, uncomment following lines
# !sed -i "s/enable_qat: false/enable_qat: true/g" $SPECS_DIR/yolo_v4_tiny_train_kitti.txt
# !sed -i "s/enable_qat: false/enable_qat: true/g" $SPECS_DIR/yolo_v4_tiny_retrain_kitti.txt

In [None]:
# By default, the sample spec file disables QAT train. You can force non-QAT train by running lines below
# !sed -i "s/enable_qat: true/enable_qat: false/g" $SPECS_DIR/yolo_v4_tiny_train_kitti.txt
# !sed -i "s/enable_qat: true/enable_qat: false/g" $SPECS_DIR/yolo_v4_tiny_retrain_kitti.txt

In [None]:
!cat $SPECS_DIR/yolo_v4_tiny_train_kitti.txt

## 5. Run TAO train <a class="anchor" id="head-5"></a>
* Provide the sample spec file and the output directory location for models
* WARNING: train will take several hours or one day to complete

In [None]:
!sudo rm -rf $EXPERIMENT_DIR/experiment_dir_unpruned
!mkdir -p $EXPERIMENT_DIR/experiment_dir_unpruned

In [None]:
print("To run with multigpu, please change --gpus based on the number of available GPUs in your machine.")
!tao model yolo_v4_tiny train -e $SPECS_DIR/yolo_v4_tiny_train_kitti.txt \
                   -r $EXPERIMENT_DIR/experiment_dir_unpruned \
                   -k $KEY \
                   --gpus 1

In [None]:
print("To resume from checkpoint, please change pretrain_model_path to resume_model_path in config file.")

In [None]:
print('Model for each epoch:')
print('---------------------')
!ls -ltrh $EXPERIMENT_DIR/experiment_dir_unpruned/weights

In [None]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
!cat $EXPERIMENT_DIR/experiment_dir_unpruned/yolov4_training_log_cspdarknet_tiny.csv
%env EPOCH=010

## 6. Evaluate trained models <a class="anchor" id="head-6"></a>

In [None]:
!tao model yolo_v4_tiny evaluate -e $SPECS_DIR/yolo_v4_tiny_train_kitti.txt \
                      -m $EXPERIMENT_DIR/experiment_dir_unpruned/weights/yolov4_cspdarknet_tiny_epoch_$EPOCH.hdf5 \
                      -k $KEY

## 7. Prune trained models <a class="anchor" id="head-7"></a>
* Specify pre-trained model
* Equalization criterion (`Only for resnets as they have element wise operations or MobileNets.`)
* Threshold for pruning.
* A key to save and load the model
* Output directory to store the model

Usually, you just need to adjust `-pth` (threshold) for accuracy and model size trade off. Higher `pth` gives you smaller model (and thus higher inference speed) but worse accuracy. The threshold value depends on the dataset and the model. `0.5` in the block below is just a start point. If the retrain accuracy is good, you can increase this value to get smaller models. Otherwise, lower this value to get better accuracy.

In [None]:
!mkdir -p $EXPERIMENT_DIR/experiment_dir_pruned

In [None]:
!tao model yolo_v4_tiny prune -m $EXPERIMENT_DIR/experiment_dir_unpruned/weights/yolov4_cspdarknet_tiny_epoch_$EPOCH.hdf5 \
                   -e $SPECS_DIR/yolo_v4_tiny_train_kitti.txt \
                   -o $EXPERIMENT_DIR/experiment_dir_pruned/yolov4_cspdarknet_tiny_pruned.hdf5 \
                   -eq intersection \
                   -pth 0.1 \
                   -k $KEY

In [None]:
!ls -rlt $EXPERIMENT_DIR/experiment_dir_pruned/

## 8. Retrain pruned models <a class="anchor" id="head-8"></a>
* Model needs to be re-trained to bring back accuracy after pruning
* Specify re-train specification
* WARNING: train will take several hours or one day to complete

In [None]:
# Printing the retrain spec file.
# Here we have updated the spec file to include the newly pruned model as a pretrained weights.
!sed -i "s|TAO_DATA_PATH|$DATA_DIR/|g" $SPECS_DIR/yolo_v4_tiny_retrain_kitti.txt
!sed -i "s|EXPERIMENT_DIR_PATH|$EXPERIMENT_DIR/|g" $SPECS_DIR/yolo_v4_tiny_retrain_kitti.txt
!cat $SPECS_DIR/yolo_v4_tiny_retrain_kitti.txt

In [None]:
!mkdir -p $EXPERIMENT_DIR/experiment_dir_retrain

In [None]:
# Retraining using the pruned model as pretrained weights
!tao model yolo_v4_tiny train --gpus 1 \
                   -e $SPECS_DIR/yolo_v4_tiny_retrain_kitti.txt \
                   -r $EXPERIMENT_DIR/experiment_dir_retrain \
                   -k $KEY

In [None]:
# Listing the newly retrained model.
!ls -rlt $EXPERIMENT_DIR/experiment_dir_retrain/weights

In [None]:
# Now check the evaluation stats in the csv file and pick the model with highest eval accuracy.
!cat $EXPERIMENT_DIR/experiment_dir_retrain/yolov4_training_log_cspdarknet_tiny.csv
%env EPOCH=010

## 9. Evaluate retrained model <a class="anchor" id="head-9"></a>

In [None]:
!tao model yolo_v4_tiny evaluate -e $SPECS_DIR/yolo_v4_tiny_retrain_kitti.txt \
                      -m $EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_cspdarknet_tiny_epoch_$EPOCH.hdf5 \
                      -k $KEY

## 10. Visualize inferences <a class="anchor" id="head-10"></a>
In this section, we run the `infer` tool to generate inferences on the trained models and visualize the results.

In [None]:
# Copy some test images
!mkdir -p $DATA_DIR/test_samples
!cp $DATA_DIR//val/images/* $DATA_DIR/test_samples/

In [None]:
# Running inference for detection on n images
!tao model yolo_v4_tiny inference -i $DATA_DIR/test_samples \
                       -r $EXPERIMENT_DIR/yolo_infer_images \
                       -e $SPECS_DIR/yolo_v4_tiny_retrain_kitti.txt \
                       -m $EXPERIMENT_DIR/experiment_dir_unpruned/weights/yolov4_cspdarknet_tiny_epoch_$EPOCH.hdf5 \
                       -k $KEY

The `inference` tool produces two outputs.
1. Overlain images in `$EXPERIMENT_DIR/yolo_infer_images`
2. Frame by frame bbox labels in kitti format located in `$EXPERIMENT_DIR/yolo_infer_labels`

In [None]:
# Simple grid visualizer
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=4, num_images=10):
    output_path = os.path.join(os.environ['EXPERIMENT_DIR'], image_dir)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path)
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img)

In [None]:
# Visualizing the sample images.
!mkdir -p $EXPERIMENT_DIR/yolo_infer_images
OUTPUT_PATH = 'yolo_infer_images/images_annotated' # relative path from $EXPERIMENT_DIR.
COLS = 3 # number of columns in the visualizer grid.
IMAGES = 9 # number of images to visualize.

visualize_images(OUTPUT_PATH, num_cols=COLS, num_images=IMAGES)

**QUANTIZE THE MODEL**

In [12]:
!pip install uff

Collecting uff
  Downloading uff-0.0.1.dev5.tar.gz (7.9 kB)
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mpython setup.py egg_info[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m See above for output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Preparing metadata (setup.py) ... [?25l[?25herror
[1;31merror[0m: [1mmetadata-generation-failed[0m

[31m×[0m Encountered error while generating package metadata.
[31m╰─>[0m See above for output.

[1;35mnote[0m: This is an issue with the package mentioned above, not pip.
[1;36mhint[0m: See above for details.


In [25]:
!/content/trt_untar/TensorRT-10.3.0.26/bin/trtexec \
    --onnx=/content/drive/MyDrive/cementbag_v4.pt.onnx \
    --saveEngine=/content/drive/MyDrive/quantized_model.trt \
    --int8 \
    --workspace=4096

&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # /content/trt_untar/TensorRT-10.3.0.26/bin/trtexec --onnx=/content/drive/MyDrive/cementbag_v4.pt.onnx --saveEngine=/content/drive/MyDrive/quantized_model.trt --int8 --workspace=4096
=== Model Options ===
  --onnx=<file>               ONNX model

=== Build Options ===
  --minShapes=spec                   Build with dynamic shapes using a profile with the min shapes provided
  --optShapes=spec                   Build with dynamic shapes using a profile with the opt shapes provided
  --maxShapes=spec                   Build with dynamic shapes using a profile with the max shapes provided
  --minShapesCalib=spec              Calibrate with dynamic shapes using a profile with the min shapes provided
  --optShapesCalib=spec              Calibrate with dynamic shapes using a profile with the opt shapes provided
  --maxShapesCalib=spec              Calibrate with dynamic shapes using a profile with the max shapes provided
                      

In [24]:
!/content/trt_untar/TensorRT-10.3.0.26/bin/trtexec --help

&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # /content/trt_untar/TensorRT-10.3.0.26/bin/trtexec --help
=== Model Options ===
  --onnx=<file>               ONNX model

=== Build Options ===
  --minShapes=spec                   Build with dynamic shapes using a profile with the min shapes provided
  --optShapes=spec                   Build with dynamic shapes using a profile with the opt shapes provided
  --maxShapes=spec                   Build with dynamic shapes using a profile with the max shapes provided
  --minShapesCalib=spec              Calibrate with dynamic shapes using a profile with the min shapes provided
  --optShapesCalib=spec              Calibrate with dynamic shapes using a profile with the opt shapes provided
  --maxShapesCalib=spec              Calibrate with dynamic shapes using a profile with the max shapes provided
                                     Note: All three of min, opt and max shapes must be supplied.
                                           Howev