# Category-level Object Pose Estimation using TAO CenterPose

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png" width="1080">

## What is CenterPose?

[CenterPose](https://arxiv.org/abs/2109.06161) a single-stage, keypoint-based approach for category-level object pose estimation, which operates on unknown object instances within a known category using a single RGB image input. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative 3D bounding cuboid dimensions.

In TAO, two different types of backbone networks are supported: [DLA34](https://arxiv.org/pdf/1707.06484.pdf) and [FAN](https://arxiv.org/abs/2204.12451). We not only provide the standard Convolutional Neural Network (CNN) backbone, but also provide the most advanced network called FAN, which is also a transformer-based classification network. For more details about training FAN backbones, please refer to the classification pytorch notebook.

### Sample prediction of CenterPose model
| **Shoes** | **Bottle** |
| :------:  | :------: |
|<img align="center" title="Shoes" src="https://github.com/vpraveen-nv/model_card_images/blob/main/cv/purpose_built_models/centerpose/image%202.png?raw=true" width="300" height="400"> |<img align="center" title="Bottle" src="https://github.com/vpraveen-nv/model_card_images/blob/main/cv/purpose_built_models/centerpose/image.png?raw=true" width="300" height="400">|

## Learning Objectives

In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Take a pretrained model and train a CenterPose model on the Google Objectron dataset
* Evaluate the trained model
* Run inference with the trained model and visualize the result
* Export the trained model to a .onnx file for deployment to DeepStream
* Generate TensorRT engine using tao-deploy and verify the engine through evaluation

At the end of this notebook, you will have generated a trained `centerpose` model
which you may deploy via [DeepStream](https://developer.nvidia.com/deepstream-sdk).

## Table of Contents

This notebook shows an example usecase of CenterPose using Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables and map drives](#head-0)
1. [Installing the TAO launcher](#head-1)
2. [Prepare dataset and pre-trained model](#head-2)
3. [Provide training specification](#head-3)
4. [Run TAO training](#head-4)
5. [Evaluate a trained model](#head-5)
6. [Visualize inferences](#head-6)
7. [Deploy](#head-7)

## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/centerpose/results`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.

The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher. <br>

`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json`  file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case so these directories are correctly visible to the docker container.


In [None]:
import os

# Please define this local project directory that needs to be mapped to the TAO docker session.
%env LOCAL_PROJECT_DIR=/path/to/local/tao-experiments

os.environ["HOST_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data", "centerpose")
os.environ["HOST_RESULTS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "centerpose", "results")

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=~/tao-samples/centerpose

# The sample spec files are present in the same path as the downloaded samples.
os.environ["HOST_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)

In [None]:
! mkdir -p $HOST_DATA_DIR
! mkdir -p $HOST_SPECS_DIR
! mkdir -p $HOST_RESULTS_DIR

In [3]:
# Mapping up the local directories to the TAO docker.
import json
import os
mounts_file = os.path.expanduser("~/.tao_mounts.json")
tao_configs = {
   "Mounts":[
         # Mapping the Local project directory
        {
            "source": os.environ["LOCAL_PROJECT_DIR"],
            "destination": "/workspace/tao-experiments"
        },
       {
           "source": os.environ["HOST_DATA_DIR"],
           "destination": "/data"
       },
       {
           "source": os.environ["HOST_SPECS_DIR"],
           "destination": "/specs"
       },
       {
           "source": os.environ["HOST_RESULTS_DIR"],
           "destination": "/results"
       }
   ],
   "DockerOptions": {
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
         },
        "user": "{}:{}".format(os.getuid(), os.getgid()),
        "network": "host"
   }
}
# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(tao_configs, mfile, indent=4)

In [None]:
!cat ~/.tao_mounts.json

## 1. Installing the TAO launcher <a class="anchor" id="head-1"></a>
The TAO launcher is a python package distributed as a python wheel listed in the `nvidia-pyindex` python index. You may install the launcher by executing the following cell.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running

```sh
export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x
```
where x >= 6 and <= 8

We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:
* python >=3.7, <=3.10.x
* docker-ce > 19.03.5
* docker-API 1.40
* nvidia-container-toolkit > 1.3.0-1
* nvidia-container-runtime > 3.4.0-1
* nvidia-docker2 > 2.5.0-1
* nvidia-driver > 455+

Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below

```sh
docker login nvcr.io
```

You will be triggered to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python >=3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the virtualenv and virtualenvwrapper packages.

In [None]:
# SKIP this step IF you have already installed the TAO launcher.
!pip3 install nvidia-pyindex
!pip3 install nvidia-tao

In [None]:
# View the versions of the TAO launcher
!tao info --verbose

## 2. Prepare dataset and pre-trained model <a class="anchor" id="head-2"></a>
### 2.1 Download and preprocess the training, validation and testing dataset
 We will be using the Google Objectron dataset for the tutorial. The following script will download Google Objectron dataset automatically. 

Here's a description of the structure:

    |--category_dataset_root:
        |--train
            |--train_video1
                |--image1.jpg
                |--image1.json
                |--image2.jpg
                |--image2.json
            |--train_video2
                |--image1.jpg
                |--image1.json
                |--image2.jpg
                |--image2.json
        |--test/validation
            |--test_video1
                |--image1.jpg
                |--image1.json
                |--image2.jpg
                |--image2.json
            |--test_video2
                |--image1.jpg
                |--image1.json
                |--image2.jpg
                |--image2.json

* The ``category_dataset_root`` directory of the specific category, which contains the following:
    * ``train``: Contains training images and its related ground truth. The images are extrated from the videos. 
    * ``test/validation``: Contains testing/validation images and its related ground truth.
* If Python version < 3.10, please install `scipy==1.5.2` and `tensorflow==2.11.0`.

In [None]:
# Install the dataset related dependencies.
!pip3 install scipy==1.9.2
!pip3 install tensorflow==2.14.0
!pip3 install opencv-python==4.8.0.74
!pip3 install tqdm==4.65.0

In [None]:
# Define the decoding functions.
import numpy as np
import cv2

def get_image(feature, shape=None):
    """Decode the tensorflow image example."""
    image = cv2.imdecode(
        np.asarray(bytearray(feature.bytes_list.value[0]), dtype=np.uint8),
        cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH)
    if len(image.shape) > 2 and image.shape[2] > 1:
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    if shape is not None:
        image = cv2.resize(image, shape)
    return image

def parse_plane(example):
    """Parses plane from a tensorflow example."""
    fm = example.features.feature
    if "plane/center" in fm and "plane/normal" in fm:
        center = fm["plane/center"].float_list.value
        center = np.asarray(center)
        normal = fm["plane/normal"].float_list.value
        normal = np.asarray(normal)
        return center, normal
    else:
        return None
    
def parse_example(example):
    """Parse the image example data"""
    fm = example.features.feature

    # Extract images, setting the input shape for Objectron Dataset
    image = get_image(fm["image/encoded"], shape=(600, 800))
    filename = fm["image/filename"].bytes_list.value[0].decode("utf-8")
    filename = filename.replace('/', '_')
    image_id = np.asarray(fm["image/id"].int64_list.value)[0]

    label = {}
    visibilities = fm["object/visibility"].float_list.value
    visibilities = np.asarray(visibilities)
    index = visibilities > 0.1

    if "point_2d" in fm:
        points_2d = fm["point_2d"].float_list.value
        points_2d = np.asarray(points_2d).reshape((-1, 9, 3))[..., :2]

    if "point_3d" in fm:
        points_3d = fm["point_3d"].float_list.value
        points_3d = np.asarray(points_3d).reshape((-1, 9, 3))

    if "object/scale" in fm:
        obj_scale = fm["object/scale"].float_list.value
        obj_scale = np.asarray(obj_scale).reshape((-1, 3))

    if "object/translation" in fm:
        obj_trans = fm["object/translation"].float_list.value
        obj_trans = np.asarray(obj_trans).reshape((-1, 3))

    if  "object/orientation" in fm:
        obj_ori = fm["object/orientation"].float_list.value
        obj_ori = np.asarray(obj_ori).reshape((-1, 3, 3))

    label["2d_instance"] = points_2d[index]
    label["3d_instance"] = points_3d[index]
    label["scale_instance"] = obj_scale[index]
    label["translation"] = obj_trans[index]
    label["orientation"] = obj_ori[index]
    label["image_id"] = image_id
    label["visibility"] = visibilities[index]
    label['ORI_INDEX'] = np.argwhere(index).flatten()
    label['ORI_NUM_INSTANCE'] = len(index)
    return image, label, filename

def parse_camera(example):
    """Parse the camera calibration data"""
    fm = example.features.feature
    if "camera/projection" in fm:
        proj = fm["camera/projection"].float_list.value
        proj = np.asarray(proj).reshape((4, 4))
    else:
        proj = None
        
    if "camera/view" in fm:
        view = fm["camera/view"].float_list.value
        view = np.asarray(view).reshape((4, 4))
    else:
        view = None
    
    if "camera/intrinsics" in fm:
        intrinsic = fm["camera/intrinsics"].float_list.value
        intrinsic = np.asarray(intrinsic).reshape((3, 3))
    else:
        intrinsic = None
    return proj, view, intrinsic

def partition(lst, n):
    """Equally split the video lists."""
    division = len(lst) / float(n) if n else len(lst)
    return [lst[int(np.round(division * i)): int(np.round(division * (i + 1)))] for i in range(n)]

* Note
    * Please select the **specific categories** you want to use for training the CenterPose model.
    * The cell will take several minutes to run because it involves dataset downloading and preprocessing.
    * Each category contains approximately 10,000 to 30,000 training images. Downloading all categories would require a large amount of drive space. The total size for downloading all 8 categories is 4.4TB.
    * The default setting is downloading the training set and validation set. The validation set is a subset of the testing set, downsampled to 30 frames per second.
    * If you are using your own dataset, please ensure that the camera calibration information is correct.
    * **Note that the sample spec is not meant to produce SOTA (state-of-the-art) accuracy on Objectron dataset. To reproduce SOTA, you should set `TRAIN_FR` as 15, `epoch` as 140 and `DATA_DOWNLOAD` as -1 to match the original parameters.**

In [None]:
import glob
import tqdm
import json
import requests
import shutil
import tensorflow as tf
import warnings
from scipy.spatial.transform import Rotation as R

OBJECTRON_BUCKET = "gs://objectron/v1/records_shuffled"
PUBLIC_URL = "https://storage.googleapis.com/objectron"
SAVE_DIR = os.getenv("HOST_DATA_DIR", os.getcwd())

# Please add the "test" into the array if you want to evaluate the whole testing set. It requires at least 30GB to download the bike category. 
# DATA_DISTRIBUTION = ['train', 'val', 'test']
DATA_DISTRIBUTION = ['train', 'val']

# Note that the sample spec is not meant to produce SOTA accuracy on Objectron dataset. 
# To reproduce SOTA, you should set `TRAIN_FR` as 15 and `DATA_DOWNLOAD` as -1 to match the original parameters.
TRAIN_FR = 30
VAL_FR = 60
TEST_FR = 1
DATA_DOWNLOAD = 10000

# Please select the specific categories that you want to train the CenterPose model. 
# CATEGORIES = ['bike', 'book', 'bottle', 'camera', 'cereal_box', 'chair', 'laptop', 'shoe']
CATEGORIES = ['bike']

memory_free = shutil.disk_usage(SAVE_DIR).free
if len(CATEGORIES) >= 8 and memory_free < 4.4E12:
    warnings.warn("No enough space for downloading all 8 categories.")

for c in CATEGORIES:
    for dist in DATA_DISTRIBUTION:
        # Download the tfrecord files
        if dist in ['test', 'val']:
            eval_data = f'/{c}/{c}_test*'
            blob_path = PUBLIC_URL + f"/v1/index/{c}_annotations_test"
        elif dist in ['train']:
            eval_data = f'/{c}/{c}_train*'
            blob_path = PUBLIC_URL + f"/v1/index/{c}_annotations_train"
        else:
            raise ValueError("No specific data distribution settings.")

        eval_shards = tf.io.gfile.glob(OBJECTRON_BUCKET + eval_data)
        ds = tf.data.TFRecordDataset(eval_shards).take(DATA_DOWNLOAD)

        with tf.io.TFRecordWriter(f'{SAVE_DIR}/{c}_{dist}.tfrecord') as file_writer:
            for serialized in tqdm.tqdm(ds): 
                example = tf.train.Example.FromString(serialized.numpy())
                record_bytes = example.SerializeToString()
                file_writer.write(record_bytes)

        # Get the video ids
        video_ids = requests.get(blob_path).text
        video_ids = [i.replace('/', '_') for i in video_ids.split('\n')]
        
        # Work on a subset of the videos for each round, where the subset is equally split
        video_ids_split = partition(video_ids, int(np.floor(len(video_ids) / int(len(video_ids) / 2))))

        # Decode the tfrecord files
        tfdata = f'{SAVE_DIR}/{c}_{dist}*'
        eval_shards = tf.io.gfile.glob(tfdata)

        new_ds = tf.data.TFRecordDataset(eval_shards).take(-1)

        for subset in video_ids_split:
            videos = {}
            for serialized in tqdm.tqdm(new_ds):

                example = tf.train.Example.FromString(serialized.numpy())

                # Group according to video_id & image_id
                fm = example.features.feature
                filename = fm["image/filename"].bytes_list.value[0].decode("utf-8")
                video_id = filename.replace('/', '_')
                image_id = np.asarray(fm["image/id"].int64_list.value)[0]
                
                # Sometimes, data is too big to save, so we only focus on a small subset instead.
                if video_id not in subset:
                    continue
                
                if video_id in videos:
                    videos[video_id].append((image_id, example))
                else:
                    videos[video_id] = []
                    videos[video_id].append((image_id, example))
            
            # Saved the decoded tfrecord files. 
            save_tfrecords = f'{SAVE_DIR}/{c}/tfrecords/{dist}'
            if not os.path.exists(save_tfrecords):
                os.makedirs(save_tfrecords)
            for video_id in tqdm.tqdm(videos):
                with tf.io.TFRecordWriter(f'{save_tfrecords}/{video_id}.tfrecord') as file_writer:
                    for image_data in videos[video_id]:
                        record_bytes = image_data[1].SerializeToString()
                        file_writer.write(record_bytes)

        # Extract the images and ground truth.
        videos = [os.path.splitext(os.path.basename(i))[0] for i in glob.glob(f'{save_tfrecords}/*.tfrecord')]
        if dist in ['train']:
            frame_rate = TRAIN_FR
        elif dist in ['val']:
            frame_rate = VAL_FR
        elif dist in ['test']:
            frame_rate = TEST_FR
        else:
            raise ValueError("No specific data distribution settings.")
        
        for idx, key in enumerate(videos):
            print(f'Video {idx}, {key}:')
            ds = tf.data.TFRecordDataset(f'{save_tfrecords}/{key}.tfrecord').take(-1)

            for serialized in tqdm.tqdm(ds):
                example = tf.train.Example.FromString(serialized.numpy())

                image, label, prefix = parse_example(example)
                frame_id = label['image_id']

                if int(frame_id) % frame_rate == 0:
                    
                    proj, view, cam_intrinsic = parse_camera(example)
                    plane = parse_plane(example)

                    cam_intrinsic[:2, :3] = cam_intrinsic[:2, :3] / 2.4
                    center, normal = plane
                    height, width, _ = image.shape

                    im_bgr = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
                    
                    dict_out = {
                        "camera_data" : {
                            "width" : width,
                            'height' : height,
                            'camera_view_matrix':view.tolist(),
                            'camera_projection_matrix':proj.tolist(),
                            'intrinsics':{
                                'fx':cam_intrinsic[1][1],
                                'fy':cam_intrinsic[0][0],
                                'cx':cam_intrinsic[1][2],
                                'cy':cam_intrinsic[0][2]
                            }
                        }, 
                        "objects" : [],
                        "AR_data":{
                            'plane_center':[center[0],
                                            center[1],
                                            center[2]],
                            'plane_normal':[normal[0],
                                            normal[1],
                                            normal[2]]
                        }
                    }
                    
                    for object_id in range(len(label['2d_instance'])):
                        object_categories = c
                        quaternion = R.from_matrix(label['orientation'][object_id]).as_quat()
                        trans = label['translation'][object_id]

                        projected_keypoints = label['2d_instance'][object_id]
                        projected_keypoints[:, 0] *= width
                        projected_keypoints[:, 1] *= height

                        object_scale = label['scale_instance'][object_id]
                        keypoints_3d = label['3d_instance'][object_id]
                        visibility = label['visibility'][object_id]

                        dict_obj={
                            'class': object_categories,
                            'name': object_categories+'_'+str(object_id),
                            'provenance': 'objectron',
                            'location': trans.tolist(),
                            'quaternion_xyzw': quaternion.tolist(),
                            'projected_cuboid': projected_keypoints.tolist(),
                            'scale': object_scale.tolist(),
                            'keypoints_3d': keypoints_3d.tolist(),
                            'visibility': visibility.tolist()
                        }
                        # Final export
                        dict_out['objects'].append(dict_obj)

                    save_path = f"{SAVE_DIR}/{c}/{dist}/{prefix}/"
                    if not os.path.exists(save_path):
                        os.makedirs(save_path)

                    filename = f"{save_path}/{str(frame_id).zfill(5)}.json"
                    with open(filename, 'w+') as fp:
                        json.dump(dict_out, fp, indent=4, sort_keys=True)
                
                    cv2.imwrite(f"{save_path}/{str(frame_id).zfill(5)}.png", im_bgr)

### 2.2 Download the pre-trained model
We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [None]:
# Installing NGC CLI on the local machine.
## Download and install
import os
import platform

if platform.machine() == "x86_64":
    os.environ["CLI"]="ngccli_linux.zip"
else:
    os.environ["CLI"]="ngccli_arm64.zip"


# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))

In [None]:
# Pull pretrained model from NGC
!mkdir -p $HOST_RESULTS_DIR/pretrained_models
!ngc registry model download-version "nvidia/tao/pretrained_fan_classification_nvimagenet:fan_small_hybrid_nvimagenet" --dest $HOST_RESULTS_DIR/pretrained_models

print("Check if model is downloaded into dir.")
!ls -l $HOST_RESULTS_DIR/pretrained_models/pretrained_fan_classification_nvimagenet_vfan_small_hybrid_nvimagenet/

## 3. Provide training specification <a class="anchor" id="head-3"></a>

We provide specification files to configure the training parameters including:

* dataset: configure the dataset and augmentation methods
    * train_data: images and annotation files for train data. Required to have correct camera calibration data
    * val_data: images and annotation files for validation data. Required to have correct camera calibration data
    * num_classes: number of categories, default is 1. The CenterPose is a category-based method
    * batch_size: batch size for dataloader
    * workers: number of workers to do data loading
    * category: category name of the training object
    * num_symmetry: number of symmetric rotations for the specific categories, e.g. bottle
    * max_objs: maximum number of training objects in one image
* model: configure the model setting
    * down_ratio: down sample ratio for the input image, default is 4
    * use_pretrained: flag to enable using the pretrained weights
    * model_type: backbone types of the CenterPose, including FAN-variants and the DLA34 backbone
    * pretrained_backbone_path: path to the pretrained backbone model. FAN-variants is supported. DLA34 backbone loads the pretrained weight automatically. 
* train: configure the training hyperparameters
    * num_gpus: number of gpus 
    * validation_interval: validation interval
    * checkpoint_interval: interval of saving the checkpoint
    * num_epochs: number of epochs
    * clip_grad_val: the value of cliping the gradient, default is 100.0
    * seed: random seed for reproducing the accuracy
    * resume_training_checkpoint_path: resume the training from the checkpoint path
    * precision: If set to fp16, the training is run on Automatic Mixed Precision (AMP)
    * optim:
        * lr: learning rate for training the model
        * lr_steps: learning rate decay step milestone (MultiStep)

Please refer to the TAO documentation about CenterPose to get all the parameters that are configurable.


In [None]:
!cat $HOST_SPECS_DIR/train.yaml

## 4. Run TAO training <a class="anchor" id="head-4"></a>
* Provide the sample spec file and the output directory location for models
* Evaluation mainly uses 3D IoU and 2D MPE (mean pixel errors) metrics. For more info, please refer to: https://github.com/google-research-datasets/Objectron
* For this demonstration, we set the training epoch equals to 1 so that the training can be completed faster.
* Unlike the [original CenterPose paper](https://arxiv.org/abs/2109.06161), we also provided a more advanced backbone called [FAN](https://arxiv.org/abs/2204.12451) that has proven to achieve higher downstream results compared to DLA34. 
* If you wish to speed up training, you may try to set `train.precision=fp16` for mixed precision training.

In [None]:
# NOTE: The following paths are set from the perspective of the TAO Docker.

# The data is saved here
%env DATA_DIR = /data
%env MODEL_DIR = /model
%env SPECS_DIR = /specs
%env RESULTS_DIR = /results

In [None]:
!echo $HOST_DATA_DIR

In [None]:
print("For multi-GPU, change train.num_gpus in train.yaml based on your machine.")
# If you face out of memory issue, you may reduce the batch size in the spec file by passing dataset.batch_size=2
!tao model centerpose train \
          -e $SPECS_DIR/train.yaml \
          results_dir=$RESULTS_DIR/

In [None]:
print('Trained checkpoints:')
print('---------------------')
!ls -ltrh $HOST_RESULTS_DIR/train

In [None]:
# You can set NUM_EPOCH to the epoch corresponding to any saved checkpoint
# %env NUM_EPOCH=029

# Get the name of the checkpoint corresponding to your set epoch
# tmp=!ls $HOST_RESULTS_DIR/train/*.pth | grep epoch_$NUM_EPOCH
# %env CHECKPOINT={tmp[0]}

# Or get the latest checkpoint
os.environ["CHECKPOINT"] = os.path.join(os.getenv("HOST_RESULTS_DIR"), "train/centerpose_model_latest.pth")

print('Rename a trained model: ')
print('---------------------')
!cp $CHECKPOINT $HOST_RESULTS_DIR/train/centerpose_model.pth
!ls -ltrh $HOST_RESULTS_DIR/train/centerpose_model.pth

## 5. Evaluate a trained model <a class="anchor" id="head-5"></a>

In this section, we run the `evaluate` tool to evaluate the trained model and produce the 3D IoU and 2D MPE metric.

We provide evaluate.yaml specification files to configure the evaluate parameters including:

* model: configure the model setting
    * this config should remain same as your trained model's configuration
* dataset: configure the dataset and augmentation methods
    * test_data: images and annotation files for validation data. Required to have correct camera calibration data
    * num_classes: number of category used for training, default is 1 because CenterPose is category-based method
    * batch_size: batch size for dataloader
    * workers: number of workers to do data loading
* evaluate:
    * num_gpus: number of gpus
    * checkpoint: load the saved trained CenterPose model
    * opencv: if True, returns the OpenCV format 3D keypoints (use for inference); if False, returns the OpenGL format 3D keypoints (use for evaluation)
    * eval_num_symmetry: evaluate the best accuracy by calculating different symmetric rotations (use for the symmetric objects)
    * results_dir: the directory of exporting the detailed accuracy report

* **NOTE: You need to change the evaluate.yaml file based on your setting.**

In [None]:
# Evaluate on TAO model
!tao model centerpose evaluate \
            -e $SPECS_DIR/evaluate.yaml \
            evaluate.checkpoint=$RESULTS_DIR/train/centerpose_model.pth \
            results_dir=$RESULTS_DIR/

## 6. Visualize Inferences <a class="anchor" id="head-6"></a>
In this section, we run the `inference` tool to generate inferences on the trained models and visualize the results. The `inference` tool produces annotated image outputs and json files that contain prediction information.

We provide evaluate.yaml specification files to configure the evaluate parameters including:

* model: configure the model setting
    * this config should remain same as your trained model's configuration
* dataset: configure the dataset and augmentation methods
    * inference_data: inference images. Not require the json file but require to have correct camera intrinsic matrix
    * num_classes: number of category used for training, default is 1 because CenterPose is category-based method
    * batch_size: batch size for dataloader
    * workers: number of workers to do data loading
* inference
    * checkpoint: load the saved trained CenterPose model
    * visualization_threshold: the confidence score threshold
    * principle_point_x: principle points (camera intrinsic matrix)
    * principle_point_y: principle points (camera intrinsic matrix)
    * focal_length_x: focal length (camera intrinsic matrix)
    * focal_length_y: focal length (camera intrinsic matrix)
    * skew: skew value (camera intrinsic matrix)
    * use_pnp: flag to enable using the PnP algorithm
    * save_json: flag to enable saving the result infomation to json file
    * save_visualization: flag to enable saving the visualization results to local
    * opencv: if True, returns the OpenCV format 3D keypoints (use for inference); if False, returns the OpenGL format 3D keypoints (use for evaluation)

* **NOTE: You need to change the infer.yaml file based on your setting.**

In [None]:
!tao model centerpose inference \
        -e $SPECS_DIR/infer.yaml \
        inference.checkpoint=$RESULTS_DIR/train/centerpose_model.pth \
        results_dir=$RESULTS_DIR/

In [None]:
# Simple grid visualizer
!pip3 install 'matplotlib>=3.3.3, <4.0'
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.png']

def visualize_images(output_path, num_cols=4, num_images=10):
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[40,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 

In [None]:
# Visualizing the sample images.
# Note that the sample spec is not meant to produce SOTA (state-of-the-art) accuracy on Objectron dataset.
IMAGE_DIR = os.path.join(os.environ['HOST_RESULTS_DIR'], "inference")
COLS = 2 # number of columns in the visualizer grid.
IMAGES = 4 # number of images to visualize.

visualize_images(IMAGE_DIR, num_cols=COLS, num_images=IMAGES)

## 7. Deploy <a class="anchor" id="head-7"></a>
In this section, it includes the ONNX model exportation and the TensorRT deployment.

### 7.1 Export the trained model to ONNX model
The `export` tool exports the trained CenterPose model to ONNX model.

We provide export.yaml specification files to configure the exportation parameters including:

* model: configure the model setting
    * this config should remain same as your trained model's configuration
* export: configure the exportation settings
    * checkpoint: load the saved trained CenterPose model
    * onnx_file: the ONNX model exportation path
    * input_channel: the number of channels of the ONNX model
    * input_width: the input width of the ONNX model
    * input_height: the input height of the ONNX model
    * opset_version: the opset version of exporting the ONNX model
    * do_constant_folding: flag that enable the constant folding (set to True if TensorRT version < 8.6)

* **NOTE: You need to change the export.yaml file based on your setting.**

In [None]:
!mkdir -p $HOST_RESULTS_DIR/export

In [None]:
# Export the RGB model to ONNX model
!tao model centerpose export \
        -e $SPECS_DIR/export.yaml \
            export.checkpoint=$RESULTS_DIR/train/centerpose_model.pth \
            export.onnx_file=$RESULTS_DIR/export/centerpose_model.onnx

### 7.2 Generate the TensorRT engine from the ONNX model
We provide gen_trt_engine.yaml specification files to configure the generation of TensorRT engine parameters including:

* gen_trt_engine: configure the exportation settings
    * onnx_file: the ONNX model loading path
    * trt_engine: the TensorRT engine exportation path
    
    * tensorrt: configure the TensorRT exportation settings
        * data_type: the precision of the TensorRT engine, including "fp32", "fp16", "int8"
        * min_batch_size: minimum number of batch size of the TensorRT engine
        * opt_batch_size: option number of batch size of the TensorRT engine
        * max_batch_size: maxiumum number of batch size of the TensorRT engine
        * calibration: TensorRT calibration settings (only on "int8" mode)
            * cal_image_dir: image directory for calculating the calibration file
            * cal_cache_file: calibration cache file for the above image directory
            * cal_batch_size: batch size of the calibration calculation

* **NOTE: You need to change the gen_trt_engine.yaml file based on your setting.**

In [None]:
# Generate TensorRT engine using tao deploy
!tao deploy centerpose gen_trt_engine -e $SPECS_DIR/gen_trt_engine.yaml \
                               gen_trt_engine.onnx_file=$RESULTS_DIR/export/centerpose_model.onnx \
                               gen_trt_engine.trt_engine=$RESULTS_DIR/gen_trt_engine/centerpose_model.engine \
                               results_dir=$RESULTS_DIR

### 7.3 Evaluate with the generated TensorRT engine
The TAO deploy provides the tool that evaluate the data with the generated TensorRT engine.

* **NOTE: You need to change the evalute.yaml file based on your setting.**

In [None]:
# Evaluate with generated TensorRT engine
!tao deploy centerpose evaluate -e $SPECS_DIR/evaluate.yaml \
                              evaluate.trt_engine=$RESULTS_DIR/gen_trt_engine/centerpose_model.engine \
                              results_dir=$RESULTS_DIR/

### 7.4 Inference the images with the generated TensorRT engine
The TAO deploy provides the tool that test the data with the generated TensorRT engine, outputing the visualization results and the related json file.

* **NOTE: You need to change the infer.yaml file based on your setting.**

In [None]:
# Inference with generated TensorRT engine
!tao deploy centerpose inference -e $SPECS_DIR/infer.yaml \
                              inference.trt_engine=$RESULTS_DIR/gen_trt_engine/centerpose_model.engine \
                              results_dir=$RESULTS_DIR/

This notebook has come to an end.