# CenterPose Model Training using Synthetic Data from Omniverse Replicator Objects Extension

[NVIDIA Omniverse™ Isaac Sim](https://docs.omniverse.nvidia.com/isaacsim/latest/index.html) is a robotics simulation toolkit for the NVIDIA Omniverse™ platform. Isaac Sim has essential features for building virtual robotic worlds and experiments. It provides researchers and practitioners with the tools and workflows they need to create robust, physically accurate simulations and synthetic datasets.

## What is Replicator Object Extension?

[omni.replicator.object](https://docs.omniverse.nvidia.com/isaacsim/latest/replicator_tutorials/tutorial_replicator_object.html) is an extension that generates synthetic data for model training without requiring any changes to the code. It can be used for various tasks, such as retail object detection and robotics. The extension takes a YAML description file as input, which describes a mutable scene or a hierarchy of stacked description files. It then outputs a description file along with graphics content, including RGB images, 2D/3D bounding boxes, segmentation masks, and more.

<img align="center" src="https://docs.omniverse.nvidia.com/isaacsim/latest/_images/overview.png" width="540">

## Learning Objectives

In this notebook, you will learn how to generate a synthetic dataset for CenterPose TAO training using the Omniverse Replicator Objects extension. The objectives of this notebook are as follows:

- Setting up a configuration file for generating a synthetic dataset in Omniverse.
- Generating the synthetic training data based on the configuration file.
- Running post-processing methods to adjust and verify the annotations.
- Training the CenterPose model using TAO.

By the end of this notebook, you will have generated a synthetic training set and a trained centerpose model.


## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the generated synthetic dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/centerpose/results`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.

The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher. <br>

`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json`  file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case so these directories are correctly visible to the docker container.


In [None]:
import os

# Please define this local project directory that needs to be mapped to the TAO docker session.
%env LOCAL_PROJECT_DIR=/path/to/local/tao-experiments

os.environ["HOST_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data", "centerpose")
os.environ["HOST_RESULTS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "centerpose", "results")

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=~/tao-samples/centerpose

# The sample spec files are present in the same path as the downloaded samples.
os.environ["HOST_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)

In [None]:
! mkdir -p $HOST_DATA_DIR
! mkdir -p $HOST_SPECS_DIR
! mkdir -p $HOST_RESULTS_DIR

In [3]:
# Mapping up the local directories to the TAO docker.
import json
import os
mounts_file = os.path.expanduser("~/.tao_mounts.json")
tao_configs = {
   "Mounts":[
         # Mapping the Local project directory
        {
            "source": os.environ["LOCAL_PROJECT_DIR"],
            "destination": "/workspace/tao-experiments"
        },
       {
           "source": os.environ["HOST_DATA_DIR"],
           "destination": "/data"
       },
       {
           "source": os.environ["HOST_SPECS_DIR"],
           "destination": "/specs"
       },
       {
           "source": os.environ["HOST_RESULTS_DIR"],
           "destination": "/results"
       }
   ],
   "DockerOptions": {
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
         },
        "user": "{}:{}".format(os.getuid(), os.getgid()),
        "network": "host"
   }
}
# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(tao_configs, mfile, indent=4)


In [None]:
!cat ~/.tao_mounts.json

## 1. Installing the TAO launcher <a class="anchor" id="head-1"></a>
The TAO launcher is a python package distributed as a python wheel listed in the `nvidia-pyindex` python index. You may install the launcher by executing the following cell.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running

```sh
export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x
```
where x >= 6 and <= 8

We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:
* python >=3.7, <=3.10.x
* docker-ce > 19.03.5
* docker-API 1.40
* nvidia-container-toolkit > 1.3.0-1
* nvidia-container-runtime > 3.4.0-1
* nvidia-docker2 > 2.5.0-1
* nvidia-driver > 455+

Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below

```sh
docker login nvcr.io
```

You will be triggered to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python >=3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the virtualenv and virtualenvwrapper packages.

In [None]:
# SKIP this step IF you have already installed the TAO launcher.
!pip3 install nvidia-pyindex
!pip3 install nvidia-tao

In [None]:
# View the versions of the TAO launcher
!tao info --verbose

## 2. Generate synthetic dataset and verify the annotations <a class="anchor" id="head-2"></a>
### 2.1 Setup the configuration file for the Omniverse Replicator
In this section, we will use the "pallet" dataset for the tutorial. The following script will automatically generate the synthetic "pallet" dataset:

- Please note that all the USD contents are sourced from `omniverse://content.ov.nvidia.com/`.
- Make sure you have the necessary access before running the code.
- More details for the hyper-parameters can be found in the [omni.replicator.object](https://docs.omniverse.nvidia.com/isaacsim/latest/replicator_tutorials/tutorial_replicator_object.html) page.


In [None]:
config_file="""
omni.replicator.object:
  version: 0.2.16
  num_frames: 20
  seed: 100
  inter_frame_time: 1
  gravity: 10000
  position_H:
    harmonizer_type: mutable_attribute
    mutable_attribute:
      distribution_type: range
      start:
      - -94.77713317047056
      - 0
      - -35.661244451558446
      end:
      - -94.77713317047056
      - 0
      - -35.661244451558446
  screen_height: 720
  focal_length: 14.228393962367306
  output_path: /tmpsrc/results
  horizontal_aperture: 20.955
  screen_width: 1080
  camera_parameters:
    far_clip: 100000
    focal_length: $(focal_length)
    horizontal_aperture: $(horizontal_aperture)
    near_clip: 0.1
    screen_height: $(screen_height)
    screen_width: $(screen_width)
  default_camera:
    count: 1
    camera_parameters: $(camera_parameters)
    transform_operators:
    - translate_global:
        distribution_type: harmonized
        harmonizer_name: position_H
    - rotateY: $[seed]*20
    - rotateX:
        distribution_type: range
        start: -15
        end: -25
    - translate:
        distribution_type: range
        start:
        - -40
        - -30
        - 400
        end:
        - 40
        - 30
        - 550
    type: camera
  distant_light:
    color:
      distribution_type: range
      end:
      - 1.3
      - 1.3
      - 1.3
      start:
      - 0.7
      - 0.7
      - 0.7
    count: 5
    intensity:
      distribution_type: range
      end: 600
      start: 150
    subtype: distant
    transform_operators:
    - rotateY:
        distribution_type: range
        end: 180
        start: -180
    - rotateX:
        distribution_type: range
        end: -10
        start: -40
    type: light
  dome_light:
    type: light
    subtype: dome
    color:
      distribution_type: range
      start:
      - 0.7
      - 0.7
      - 0.7
      end:
      - 1.3
      - 1.3
      - 1.3
    intensity:
      distribution_type: range
      start: 1000
      end: 3000
    transform_operators:
    - rotateX: 270
  plane:
    physics: collision
    type: geometry
    subtype: plane
    tracked: false
    transform_operators:
    - scale:
      - 5
      - 5
      - 5
  rotY_H:
    harmonizer_type: mutable_attribute
    mutable_attribute:
      distribution_type: range
      start: 0
      end: 0
  translate_H:
    harmonizer_type: mutable_attribute
    mutable_attribute:
      distribution_type: range
      start:
      - 0
      - 60
      - 0
      end:
      - 0
      - 30
      - 0

  pallet:
    count: 2
    physics: rigidbody
    type: geometry
    subtype: mesh
    tracked: true
    transform_operators:
    - translate_global:
        distribution_type: harmonized
        harmonizer_name: position_H
    - translate:
      - 120 * ($[index]%2)
      - 10 * ($[index]-1) * ($[index])
      - 0
    - rotateXYZ:
      - -90
      - 0
      - 0
    - scale:
      - 1
      - 1
      - 1
    usd_path:
      distribution_type: set
      values: 
      - omniverse://content.ov.nvidia.com/NVIDIA/Assets/DigitalTwin/Assets/Warehouse/Shipping/Pallets/Wood/Block_A/BlockPallet_A08_PR_NVD_01.usd
  box:
    count: 2
    physics: rigidbody
    type: geometry
    subtype: mesh
    tracked: false
    transform_operators:
    - translate_global:
        distribution_type: harmonized
        harmonizer_name: position_H
    - translate_pallet:
        distribution_type: harmonized
        harmonizer_name: translate_H
    - rotateY:
        distribution_type: harmonized
        harmonizer_name: rotY_H
    - translate:
      - 120 * ($[index])
      - 20
      - 0
    - rotateXYZ:
      - 0
      - -90
      - -90
    - scale:
      - 12
      - 10
      - 6
    usd_path:
      distribution_type: set
      values:
      - omniverse://content.ov.nvidia.com/NVIDIA/Assets/DigitalTwin/Assets/Warehouse/Shipping/Cardboard_Boxes/White_A/WhiteCorrugatedBox_A01_10x10x10cm_PR_NVD_01.usd
      - omniverse://content.ov.nvidia.com/NVIDIA/Assets/DigitalTwin/Assets/Warehouse/Shipping/Cardboard_Boxes/Cube_A/CubeBox_A01_10cm_PR_NVD_01.usd
  warehouse:
    type: geometry
    subtype: mesh
    usd_path: omniverse://content.ov.nvidia.com/NVIDIA/Assets/Isaac/2023.1.1/Isaac/Environments/Simple_Warehouse/warehouse_with_forklifts.usd
    transform_operators:
    - translate:
      - -200
      - 0.1
      - 0
    - rotateXYZ:
      - 0
      - -90
      - -90
    - scale:
      - 100
      - 100
      - 100

  output_switches:
    images: True
    labels: True
    descriptions: False
    3d_labels: True
    segmentation: False
"""

In [None]:
# Define the configuration file and save to the local.
import yaml
yaml_file = yaml.safe_load(config_file)
with open(os.path.join(os.getenv("HOST_DATA_DIR", os.getcwd()), 'config.yaml'), 'w') as outfile:
    yaml.dump(yaml_file, outfile, default_flow_style=False)


* Note that you could simple adjust the hyper-parameter from config file to generate the synthetic data on your use case.
* **If you are using content from omniver nucleus servers you need to set your own OMNI_USER and OMNI_PASS variables.**

In [None]:
# Define the Omniverse password
%env OMNI_PASS=YOUR_OWN_OMNI_PASSWORD

### 2.2 Launch the synthetic data generation
Launch the synthetic data generation with the Omniverse replicator object extension inside the container.

The following example is one of the synthetic data generation scenes. You can modify the config file to generate different scenes with various objects, backgrounds, and numbers of target objects.
* Note that the current synthetic data generation pipeline only supports single GPU. 

In [None]:
!docker run --gpus device=0 -it \
    --entrypoint /bin/bash \
    --network host \
    -v $HOST_DATA_DIR:/tmpsrc \
    -e OMNI_USER='$omni-api-token' \
    -e OMNI_PASS=$OMNI_PASS \
    nvcr.io/nvidia/isaac-sim:4.0.0 \
    -c "apt-get update && apt-get install libglib2.0-dev -y && bash isaac-sim.sh --no-window --allow-root --/windowless=True --allow-root --/log/outputStreamLevel=fatal --/app/extensions/fastImporter/enabled=false --enable omni.replicator.object --/config/file=/tmpsrc/config.yaml"

### 2.3 Visualize the generated data
In this section, we will use the `visualization` tool to visualize the generated synthetic data. The generation tool produces synthetic images and corresponding JSON files that contain training annotation information.

Once the synthetic data is generated, it is stored in the `HOST_DATA_DIR`. It can be visualized using the following commands.


In [None]:
# Install the matplotlib dependencies.
!pip3 install "matplotlib>=3.3.3, <4.0"
!pip3 install opencv-python==4.8.0.74
!pip3 install numpy==1.24.4
!pip3 install pyrr
!pip3 install scipy

In [None]:
# Simple grid visualizer
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg']

def visualize_images(output_path, num_cols=4, num_images=10):
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[40,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 

In [None]:
# Visualizing the sample images.
# Note that the sample spec is not meant to produce SOTA (state-of-the-art) accuracy on Objectron dataset.
IMAGE_DIR = os.path.join(os.environ['HOST_DATA_DIR'], 'results', 'images')
COLS = 2 # number of columns in the visualizer grid.
IMAGES = 4 # number of images to visualize.

visualize_images(IMAGE_DIR, num_cols=COLS, num_images=IMAGES)

### 2.4 Visualize and verify the synthetic data annotation 
In this section, we visualize the annotation for the generated synthetic data by using the related json file. 
- We first setup the PnP solver for verifying the 2d keypoints, camera intrinsic and the object scale. 
- We then visualize the annotation on the image. 

In [None]:
import numpy as np
import cv2
from enum import IntEnum
from pyrr import Quaternion
from scipy.spatial.transform import Rotation as R

class CuboidVertexType(IntEnum):
    FrontTopRight = 0
    FrontTopLeft = 1
    FrontBottomLeft = 2
    FrontBottomRight = 3
    RearTopRight = 4
    RearTopLeft = 5
    RearBottomLeft = 6
    RearBottomRight = 7
    Center = 8
    TotalCornerVertexCount = 8  # Corner vertexes doesn't include the center point
    TotalVertexCount = 9

class Cuboid3d():
    '''This class contains a 3D cuboid.'''

    # Create a box with a certain size
    def __init__(self, size3d=[1.0, 1.0, 1.0],
                 coord_system=None, parent_object=None):
        # NOTE: This local coordinate system is similar
        # to the intrinsic transform matrix of a 3d object
        self.center_location = [0, 0, 0]
        # self.center_location = [size3d[0]/2,size3d[1]/2,size3d[2]/2]
        self.coord_system = coord_system
        self.size3d = size3d
        self._vertices = [0, 0, 0] * CuboidVertexType.TotalCornerVertexCount
        # self._vertices = [0, 0, 0] * CuboidVertexType.TotalVertexCount

        self.generate_vertexes()

    def get_vertex(self, vertex_type):
        """Returns the location of a vertex.

        Args:
            vertex_type: enum of type CuboidVertexType

        Returns:
            Numpy array(3) - Location of the vertex type in the cuboid
        """
        return self._vertices[vertex_type]

    def get_vertices(self):
        return self._vertices

    def generate_vertexes(self):
        width, height, depth = self.size3d

        # By default just use the normal OpenCV coordinate system
        if (self.coord_system is None):
            cx, cy, cz = self.center_location
            # X axis point to the right
            right = cx + width / 2.0
            left = cx - width / 2.0
            # Y axis point upward
            top = cy + height / 2.0
            bottom = cy - height / 2.0
            # Z axis point forward
            front = cz + depth / 2.0
            rear = cz - depth / 2.0

            # List of 8 vertices of the box
            self._vertices = [
                # self.center_location,   # Center
                [left, bottom, rear],  # Rear Bottom Left
                [left, bottom, front],  # Front Bottom Left
                [left, top, rear],  # Rear Top Left
                [left, top, front],  # Front Top Left

                [right, bottom, rear],  # Rear Bottom Right
                [right, bottom, front],  # Front Bottom Right
                [right, top, rear],  # Rear Top Right
                [right, top, front],  # Front Top Right

            ]

class CuboidPNPSolver(object):
    """
    This class is used to find the 6-DoF pose of a cuboid given its projected vertices.

    Runs perspective-n-point (PNP) algorithm.
    """

    # Class variables
    cv2version = cv2.__version__.split('.')
    cv2majorversion = int(cv2version[0])

    def __init__(self, scaling_factor=1,
                 camera_intrinsic_matrix=None,
                 cuboid3d=None,
                 dist_coeffs=np.zeros((4, 1)),
                 min_required_points=4
                 ):

        self.min_required_points = max(4, min_required_points)
        self.scaling_factor = scaling_factor

        if (not camera_intrinsic_matrix is None):
            self._camera_intrinsic_matrix = camera_intrinsic_matrix
        else:
            self._camera_intrinsic_matrix = np.array([
                [0, 0, 0],
                [0, 0, 0],
                [0, 0, 0]
            ])
        self._cuboid3d = cuboid3d
        
        self._dist_coeffs = dist_coeffs
        
    def set_camera_intrinsic_matrix(self, new_intrinsic_matrix):
        '''Sets the camera intrinsic matrix'''
        self._camera_intrinsic_matrix = new_intrinsic_matrix

    def set_dist_coeffs(self, dist_coeffs):
        '''Sets the camera intrinsic matrix'''
        self._dist_coeffs = dist_coeffs

    def solve_pnp(self,
                  cuboid2d_points,
                  pnp_algorithm=None,
                  OPENCV_RETURN = True,
                  verbose = False
                  ):
        """
        Detects the rotation and traslation 
        of a cuboid object from its vertexes' 
        2D location in the image

        Inputs:
        - cuboid2d_points:  list of XY tuples
        - pnp_algorithm: None  
          ...

        Outputs:
        - location in 3D
        - pose in 3D (as quaternion)
        - projected points:  np.ndarray of np.ndarrays

        """
        
        # Fallback to default PNP algorithm base on OpenCV version
        if pnp_algorithm is None:
            if CuboidPNPSolver.cv2majorversion == 2:
                pnp_algorithm = cv2.CV_ITERATIVE
            elif CuboidPNPSolver.cv2majorversion == 3:
                pnp_algorithm = cv2.SOLVEPNP_ITERATIVE
            else:
                pnp_algorithm = cv2.SOLVEPNP_ITERATIVE

        location = None
        quaternion = None
        location_new = None
        quaternion_new = None
        reprojectionError = None

        projected_points = cuboid2d_points
        cuboid3d_points = np.array(self._cuboid3d.get_vertices())
        
        obj_2d_points = []
        obj_3d_points = []

        # 8*n points
        for i in range(len(cuboid2d_points)):
            check_point_2d = cuboid2d_points[i]
            # Ignore invalid points
            if (check_point_2d is None or check_point_2d[0] < -5000 or check_point_2d[1] < -5000):
                continue
            obj_2d_points.append(check_point_2d)
            obj_3d_points.append(
                cuboid3d_points[int(i // (len(cuboid2d_points) / CuboidVertexType.TotalCornerVertexCount))]) # TotalCornerVertexCount = 8 
        
        obj_2d_points = np.array(obj_2d_points, dtype=float)
        obj_3d_points = np.array(obj_3d_points, dtype=float)
        valid_point_count = len(obj_2d_points)

        # Can only do PNP if we have more than 3 valid points
        is_points_valid = valid_point_count >= self.min_required_points

        if is_points_valid:

            # Heatmap representation may have less than 6 points, in which case we have to use another pnp algorithm
            if valid_point_count < 6:
                pnp_algorithm = cv2.SOLVEPNP_EPNP

            # Usually, we use this one
            ret, rvec, tvec, reprojectionError = cv2.solvePnPGeneric(
                obj_3d_points,
                obj_2d_points,
                self._camera_intrinsic_matrix,
                self._dist_coeffs,
                flags=pnp_algorithm
            )
            
            if ret:

                rvec = np.array(rvec[0])
                tvec = np.array(tvec[0])

                reprojectionError = reprojectionError.flatten()[0]

                # Convert OpenCV coordinate system to OpenGL coordinate system
                transformation = np.identity(4)
                r = R.from_rotvec(rvec.reshape(1, 3))
                transformation[:3, :3] = r.as_matrix()
                transformation[:3, 3] = tvec.reshape(1, 3)
                M = np.zeros((4, 4))
                M[0, 1] = 1
                M[1, 0] = 1
                M[3, 3] = 1
                M[2, 2] = -1
                transformation = np.matmul(M, transformation)

                rvec_new = R.from_matrix(transformation[:3, :3]).as_rotvec()
                tvec_new = transformation[:3, 3]

                # OpenGL result, to be compared against GT
                location_new = list(x for x in tvec_new)
                quaternion_new = self.convert_rvec_to_quaternion(rvec_new)

                # OpenCV result
                location = list(x[0] for x in tvec)
                quaternion = self.convert_rvec_to_quaternion(rvec)
                
                # Still use OpenCV way to project 3D points
                projected_points, _ = cv2.projectPoints(cuboid3d_points, rvec, tvec, self._camera_intrinsic_matrix,
                                                        self._dist_coeffs)
                
                projected_points = np.squeeze(projected_points)
                
                x, y, z = location
                if z < 0:
                    location = None
                    quaternion = None
                    location_new = None
                    quaternion_new = None

                    if verbose:
                        print("PNP solution is behind the camera (Z < 0) => Fail")
                else:
                    if verbose:
                        print("solvePNP found good results - location: {} - rotation: {} !!!".format(location, quaternion))
            else:
                if verbose:
                    print('Error:  solvePnP return false ****************************************')
        else:
            if verbose:
                print("Need at least 4 valid points in order to run PNP. Currently: {}".format(valid_point_count))

        if OPENCV_RETURN:
            # Return OpenCV result for demo
            return location, quaternion, projected_points, reprojectionError
        else:
            # Return OpenGL result for eval
            return location_new, quaternion_new, projected_points, reprojectionError
        
    def convert_rvec_to_quaternion(self, rvec): 
        '''Convert rvec (which is log quaternion) to quaternion'''
        theta = np.sqrt(rvec[0] * rvec[0] + rvec[1] * rvec[1] + rvec[2] * rvec[2])  # in radians
        raxis = [rvec[0] / theta, rvec[1] / theta, rvec[2] / theta]
        return Quaternion.from_axis_rotation(raxis, theta)

def pnp_processing(points_filtered, scale, cam_intrinsic, OPENCV_RETURN=True):
    # initial a 3d cuboid
    cuboid3d = Cuboid3d(1 * np.array(scale))

    pnp_solver = \
        CuboidPNPSolver(
            cuboid3d=cuboid3d
        )
    pnp_solver.set_camera_intrinsic_matrix(cam_intrinsic)
    
    location, quaternion, projected_points, reprojectionError = pnp_solver.solve_pnp(
        points_filtered, OPENCV_RETURN=OPENCV_RETURN)  # N * 2
    
    if location is not None:

        ori = R.from_quat(quaternion).as_matrix()
        pose_pred = np.identity(4)
        pose_pred[:3, :3] = ori
        pose_pred[:3, 3] = location
        point_3d_obj = cuboid3d.get_vertices()
        
        point_3d_cam = pose_pred @ np.hstack(
            (np.array(point_3d_obj), np.ones((np.array(point_3d_obj).shape[0], 1)))).T
        point_3d_cam = point_3d_cam[:3, :].T  # 8 * 3
        
        
        # Add the centroid
        point_3d_cam = np.insert(point_3d_cam, 0, np.mean(point_3d_cam, axis=0), axis=0)
        
        # Add the center
        projected_points = np.insert(projected_points, 0, np.mean(projected_points, axis=0), axis=0)
        
        return projected_points, point_3d_cam, location, quaternion

    return None

- Note that the default image format is `.jpg` from the replicator. Please revise the following image extension if you generated other image format. 

In [None]:
import json
import cv2
valid_image_ext = ['.jpg']

def add_obj_order(img, keypoints2d, img_id='default', pred_flag='pred'):
    bbox = np.array(keypoints2d, dtype=np.int32)
    font = cv2.FONT_HERSHEY_SIMPLEX

    for i in range(len(bbox)):
        txt = '{:d}'.format(i)
        cat_size = cv2.getTextSize(txt, font, 1, 2)[0]
        cv2.putText(img, txt, (bbox[i][0], bbox[i][1] + cat_size[1]),
                    font, 1, (0, 0, 255), thickness=2, lineType=cv2.LINE_AA)

def add_coco_hp(img, points):
    # use for drawing the 3D bounding box
    edges = [[2, 4], [2, 6], [6, 8], [4, 8],
                [1, 2], [3, 4], [5, 6], [7, 8],
                [1, 3], [1, 5], [3, 7], [5, 7]]

    num_joints = 8
    points = np.array(points, dtype=np.int32).reshape(num_joints, 2)
    # Draw edges
    for j, e in enumerate(edges):
        temp = [e[0] - 1, e[1] - 1]
        edge_color = (0, 255, 0)  # bgr
        if points[temp[1], 0] <= -10000 or points[temp[1], 1] <= -10000 or points[temp[0], 0] <= -10000 or \
                points[temp[0], 1] <= -10000:
            continue
        else:
            cv2.line(img, (points[temp[0], 0], points[temp[0], 1]),
                        (points[temp[1], 0], points[temp[1], 1]), edge_color, 2)

def add_axes(img, box, cam_intrinsic):
    # box 9x3 array
    # OpenCV way
    N = 20
    # Centroid, top, front, right
    axes_point_list = [0, box[3] - box[1], box[2] - box[1], box[5] - box[1]]
    
    viewport_point_list = []
    for axes_point in axes_point_list:
        vector = axes_point
        vector = vector / np.linalg.norm(vector) * N if np.linalg.norm(vector) != 0 else 0
        vector = vector + box[0]
        vector = vector.flatten()

        k_3d = np.array([vector[0], vector[1], vector[2]])
        pp = np.matmul(cam_intrinsic, k_3d.reshape(3, 1))
        viewport_point = [pp[0] / pp[2], pp[1] / pp[2]]
        viewport_point_list.append((int(viewport_point[0]), int(viewport_point[1])))

    # BGR space
    cv2.line(img, viewport_point_list[0], viewport_point_list[1], (0, 255, 0), 5)  # y-> green
    cv2.line(img, viewport_point_list[0], viewport_point_list[2], (255, 0, 0), 5)  # z-> blue
    cv2.line(img, viewport_point_list[0], viewport_point_list[3], (0, 0, 255), 5)  # x-> red

def visualize_annotation(imgs, anns, num_cols=4, num_images=10):
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[40, 30])
    f.tight_layout()
    for idx, img_path in enumerate(imgs[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)

        with open(anns[idx]) as f:
            ann = json.load(f)

        num_objs = len(ann['objects'])

        cx = ann['camera_data']['intrinsics']['cx']
        cy = ann['camera_data']['intrinsics']['cy']
        fx = ann['camera_data']['intrinsics']['fx']
        fy = ann['camera_data']['intrinsics']['fy']
        cam_intrinsic = np.array(
                    [[fx, 0, cx], [0, fy, cy], [0, 0, 1]])

        for k in range(num_objs):

            # Read image and annotation
            ann_obj = ann['objects'][k]

            pts_ori = np.array(ann_obj['projected_cuboid'])
            scale = np.array(ann_obj['scale'])
            
            # Run the PnP solver for futher verifying the 2d keypoint, scale and camera intrinsic. 
            projected_points, point_3d_cam, rotation, quaternion = pnp_processing(pts_ori[1:], scale, cam_intrinsic)

            # GT visualization
            add_obj_order(img, projected_points)
            add_coco_hp(img, projected_points[1:])
            add_axes(img, point_3d_cam, cam_intrinsic)

        axarr[row_id, col_id].imshow(img) 

The following visualization code is running the PnP solver to verify the 2d keypoint, scale and camera intrinsic with the pose. 
-  "2D Keypoints" + "Object scale" + "Camera Intrinsic" ==> PnP solver ==> Object Pose + projected keypoint + 3D keypoint in camera space

In [None]:
# Visualizing the sample images.
# Note that the sample spec is not meant to produce SOTA (state-of-the-art) accuracy on Objectron dataset.
IMAGE_DIR = os.path.join(os.environ['HOST_DATA_DIR'], 'results', 'images')
COLS = 2 # number of columns in the visualizer grid.
IMAGES = 4 # number of images to visualize.

imgs = [os.path.join(IMAGE_DIR, image) for image in os.listdir(IMAGE_DIR) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
anns = [image.replace(os.path.splitext(image)[1].lower(), '.json').replace('/images/', '/3d_labels/')
        for image in imgs]
visualize_annotation(imgs, anns, num_cols=COLS, num_images=IMAGES)

## 3. Post-processing Procedure before Launching the CenterPose Training
In this section, we run some dataset post-processing for finalizing the annotations. 
- Note that this part will include into the pose writer in the next release.  

### 3.1 Rotate the annotation axis and add the missing hyperparameters

In this section, we convert the annotation file to the [Objectron format](https://github.com/google-research-datasets/Objectron/blob/master/notebooks/Parse%20Annotations.ipynb). 

The Objectron format defines the orientation of the coordinate system as follows: the positive y-axis points upwards (aligned with gravity), the positive z-axis points towards the user, and the positive x-axis follows the right-hand rule. The front face is defined as the positive z-axis on the xy-plane, and the top face is defined as the positive y-axis on the xz-plane. Therefore, we need to convert the annotation to match the Objectron format.

The original 3D label annotation is missing the image width, image height, and the camera view matrix. Therefore, we added these hyperparameters to the annotation file as well.


In [None]:
Z_TO_Y_AXIS = [0, 2, 4, 1, 3, 6, 8, 5, 7]
SCALE_CONV = [0, 2, 1]

def annotation_converter(output_path):
    imgs = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    anns = [image.replace(os.path.splitext(image)[1].lower(), '.json').replace('/images/', '/3d_labels/')
            for image in imgs]

    for idx, img_path in enumerate(imgs):

        with open(anns[idx]) as f:
            ann = json.load(f)

        img = cv2.imread(img_path)
        height, width, _ = img.shape
        if "width" not in ann["camera_data"]:
            ann["camera_data"]["width"] = width
        if "height" not in ann["camera_data"]:
            ann["camera_data"]["height"] = height
        ann["camera_data"]["camera_view_matrix"] = np.eye(4).tolist()
        num_objs = len(ann['objects'])

        dict_out = {"camera_data": ann["camera_data"], "objects": []}
        for k in range(num_objs):

            # Read image and annotation
            ann_obj = ann['objects'][k]

            pts_ori = np.array(ann_obj['projected_cuboid'])
            scale = np.array(ann_obj['scale'])
            pts_3d = np.array(ann_obj['keypoints_3d'])
            
            # Rotate the 3D and 2D axis
            new_keypoints_2d = []
            new_keypoints_3d = []
            new_keypoints_3d = pts_3d[Z_TO_Y_AXIS][Z_TO_Y_AXIS][Z_TO_Y_AXIS]
            new_keypoints_2d = pts_ori[Z_TO_Y_AXIS][Z_TO_Y_AXIS][Z_TO_Y_AXIS]
            new_scale = scale[SCALE_CONV]

            ann_obj['projected_cuboid'] = new_keypoints_2d.tolist()
            ann_obj['keypoints_3d'] = new_keypoints_3d.tolist()
            ann_obj['scale'] = list(new_scale)
            ann_obj['visibility'] = 1.0
            dict_out["objects"].append(ann_obj)

        json_save = img_path.replace(os.path.splitext(img_path)[1], '.json')
        with open(json_save, 'w+') as fp:
            print(f"Saving the updated 3d labels to {json_save}")
            json.dump(dict_out, fp, indent=4, sort_keys=True)

annotation_converter(IMAGE_DIR)

### 3.2 AR data calcuation
The CenterPose predicts the 3D bounding boxes from RGB images, the 3D keypoints may up to scale. However, the ground truth is at metric scale. In other words, they are may in different scale, because predicting scale from monocular camera (single RGB image) is ambiguous. Therefore, there has a method to re-scale the predicted box using ground planes (assuming the box is sitting on the ground).

In [None]:
# Define the Box function to calculate the ground planes.
FACES = np.array([
    [5, 6, 8, 7],  # +x on yz plane
    [1, 3, 4, 2],  # -x on yz plane
    [3, 7, 8, 4],  # +y on xz plane = top
    [1, 2, 6, 5],  # -y on xz plane
    [2, 4, 8, 6],  # +z on xy plane = front
    [1, 5, 7, 3],  # -z on xy plane
])

class Box(object):
  """General 3D Oriented Bounding Box."""

  def __init__(self, vertices=None):

    self.vertices = vertices

  def get_ground_plane(self, gravity_axis=1):
    """Get ground plane under the box."""

    gravity = np.zeros(3)
    gravity[gravity_axis] = 1

    def get_face_normal(face, center):
      """Get a normal vector to the given face of the box."""
      v1 = self.vertices[face[0], :] - center
      v2 = self.vertices[face[1], :] - center
      normal = np.cross(v1, v2)
      return normal

    def get_face_center(face):
      """Get the center point of the face of the box."""
      center = np.zeros(3)
      for vertex in face:
        center += self.vertices[vertex, :]
      center /= len(face)
      return center

    ground_plane_id = 0
    ground_plane_error = 10.

    # The ground plane is defined as a plane aligned with gravity.
    # gravity is the (0, 1, 0) vector in the world coordinate system.
    for i in [0, 2, 4]:
      face = FACES[i, :]
      center = get_face_center(face)
      normal = get_face_normal(face, center)
      w = np.cross(gravity, normal)
      w_sq_norm = np.linalg.norm(w)
      if w_sq_norm < ground_plane_error:
        ground_plane_error = w_sq_norm
        ground_plane_id = i

    face = FACES[ground_plane_id, :]
    center = get_face_center(face)
    normal = get_face_normal(face, center)

    # For each face, we also have a parallel face that it's normal is also
    # aligned with gravity vector. We pick the face with lower height (y-value).
    # The parallel to face 0 is 1, face 2 is 3, and face 4 is 5.
    parallel_face_id = ground_plane_id + 1
    parallel_face = FACES[parallel_face_id]
    parallel_face_center = get_face_center(parallel_face)
    parallel_face_normal = get_face_normal(parallel_face, parallel_face_center)
    if parallel_face_center[gravity_axis] < center[gravity_axis]:
      center = parallel_face_center
      normal = parallel_face_normal
    return center, normal

In [None]:
def ar_data_converter(output_path):
    imgs = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    anns = [image.replace(os.path.splitext(image)[1].lower(), '.json')
            for image in imgs]

    for idx, img_path in enumerate(imgs):

        with open(anns[idx]) as f:
            ann = json.load(f)

        num_objs = len(ann['objects'])

        dict_out = {"camera_data": ann["camera_data"], "objects": ann["objects"], "AR_data":{}}

        for k in range(num_objs):

            # Read image and annotation
            ann_obj = ann['objects'][k]
            ann_obj['AR_data'] = {}
            pts_3d = np.array(ann_obj['keypoints_3d'])
            
            # Calculate the AR data
            pts3d_box = Box(pts_3d)
            center, normal = pts3d_box.get_ground_plane()

            # setup the object-wise ar data values
            ann_obj['AR_data']['plane_center'] = center.tolist()
            ann_obj['AR_data']['plane_normal'] = normal.tolist()

            # setup the global ar data values
            dict_out['AR_data']['plane_center'] = center.tolist()
            dict_out['AR_data']['plane_normal'] = normal.tolist()

        json_save = img_path.replace(os.path.splitext(img_path)[1], '.json')
        with open(json_save, 'w+') as fp:
            print(f"Saving the updated 3d labels to {json_save}")
            json.dump(dict_out, fp, indent=4, sort_keys=True)

ar_data_converter(IMAGE_DIR)

## 4. Launch the CenterPose TAO Training using the generated synthetic dataset
In this section, it will introduce how to use the synthetic dataset to launch the training. 
More details regarding to the hyper-parameters and the end-to-end training pipeline could be found in the [CenterPose Notebook](https://github.com/NVIDIA/tao_tutorials/blob/main/notebooks/tao_launcher_starter_kit/centerpose/centerpose.ipynb).

### 4.1 Download the pre-trained model
We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [None]:
# # Installing NGC CLI on the local machine.
# ## Download and install
# %env CLI=ngccli_cat_linux.zip
# !mkdir -p $LOCAL_PROJECT_DIR/ngccli

# # Remove any previously existing CLI installations
# !rm -rf $LOCAL_PROJECT_DIR/ngccli/*
# !wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
# !unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
# !rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
# os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))

In [None]:
# Pull pretrained model from NGC
!mkdir -p $HOST_RESULTS_DIR/pretrained_models
!ngc registry model download-version "nvidia/tao/centerpose:trainable_fan_small" --dest $HOST_RESULTS_DIR/pretrained_models
print("Check if model is downloaded into dir.")
!ls -l $HOST_RESULTS_DIR/pretrained_models/centerpose_vtrainable_fan_small/

### 4.2 Provide training specification <a class="anchor" id="head-3"></a>

We provide specification files to configure the training parameters including:

* dataset: configure the dataset and augmentation methods
    * train_data: images and annotation files for train data. Required to have correct camera calibration data
    * val_data: images and annotation files for validation data. Required to have correct camera calibration data
    * num_classes: number of categories, default is 1. The CenterPose is a category-based method
    * batch_size: batch size for dataloader
    * workers: number of workers to do data loading
    * category: category name of the training object
    * num_symmetry: number of symmetric rotations for the specific categories, e.g. bottle
    * max_objs: maximum number of training objects in one image
* model: configure the model setting
    * down_ratio: down sample ratio for the input image, default is 4
    * use_pretrained: flag to enable using the pretrained weights
    * model_type: backbone types of the CenterPose, including FAN-variants and the DLA34 backbone
    * pretrained_backbone_path: path to the pretrained backbone model. FAN-variants is supported. DLA34 backbone loads the pretrained weight automatically. 
* train: configure the training hyperparameters
    * num_gpus: number of gpus 
    * validation_interval: validation interval
    * checkpoint_interval: interval of saving the checkpoint
    * num_epochs: number of epochs
    * clip_grad_val: the value of cliping the gradient, default is 100.0
    * randomseed: random seed for reproducing the accuracy
    * resume_training_checkpoint_path: resume the training from the checkpoint path
    * precision: If set to fp16, the training is run on Automatic Mixed Precision (AMP)
    * optim:
        * lr: learning rate for training the model
        * lr_steps: learning rate decay step milestone (MultiStep)

Please refer to the TAO documentation about CenterPose to get all the parameters that are configurable.


!cat $HOST_SPECS_DIR/train.yaml

### 4.3 Run TAO training <a class="anchor" id="head-4"></a>
* Provide the sample spec file and the output directory location for models
* Evaluation mainly uses 3D IoU and 2D MPE (mean pixel errors) metrics. For more info, please refer to: https://github.com/google-research-datasets/Objectron
* For this demonstration, we set the training epoch equals to 1 so that the training can be completed faster.
* Unlike the [original CenterPose paper](https://arxiv.org/abs/2109.06161), we also provided a more advanced backbone called [FAN](https://arxiv.org/abs/2204.12451) that has proven to achieve higher downstream results compared to DLA34. 
* If you wish to speed up training, you may try to set `train.precision=fp16` for mixed precision training.

In [None]:
# NOTE: The following paths are set from the perspective of the TAO Docker.

# The data is saved here
%env DATA_DIR = /data
%env MODEL_DIR = /model
%env SPECS_DIR = /specs
%env RESULTS_DIR = /results

In [None]:
!echo $HOST_DATA_DIR

In [None]:
print("For multi-GPU, change train.num_gpus in train.yaml based on your machine.")
# If you face out of memory issue, you may reduce the batch size in the spec file by passing dataset. batch_size=2
!tao model centerpose train \
          -e $SPECS_DIR/train_synthetic.yaml \
          results_dir=$RESULTS_DIR/

In [None]:
print('Trained checkpoints:')
print('---------------------')
!ls -ltrh $HOST_RESULTS_DIR/train

In [None]:
# You can set NUM_EPOCH to the epoch corresponding to any saved checkpoint
# %env NUM_EPOCH=029

# Get the name of the checkpoint corresponding to your set epoch
# tmp=!ls $HOST_RESULTS_DIR/train/*.pth | grep epoch_$NUM_EPOCH
# %env CHECKPOINT={tmp[0]}

# Or get the latest checkpoint
os.environ["CHECKPOINT"] = os.path.join(os.getenv("HOST_RESULTS_DIR"), "train/centerpose_model_latest.pth")

print('Rename a trained model: ')
print('---------------------')
!cp $CHECKPOINT $HOST_RESULTS_DIR/train/centerpose_model.pth
!ls -ltrh $HOST_RESULTS_DIR/train/centerpose_model.pth


This notebook has come to an end.
More details of end-to-end training and inference pipeline could be found in the [CenterPose Notebook](https://github.com/NVIDIA/tao_tutorials/blob/main/notebooks/tao_launcher_starter_kit/centerpose/centerpose.ipynb).