# Object Detection Model Training using Synthetic Data from Omniverse Replicator Objects Extension

[NVIDIA Omniverse™ Isaac Sim](https://docs.omniverse.nvidia.com/isaacsim/latest/index.html) is a robotics simulation toolkit for the NVIDIA Omniverse™ platform. Isaac Sim has essential features for building virtual robotic worlds and experiments. It provides researchers and practitioners with the tools and workflows they need to create robust, physically accurate simulations and synthetic datasets.

## What is Replicator Object Extension?

[omni.replicator.object](https://docs.omniverse.nvidia.com/isaacsim/latest/replicator_tutorials/tutorial_replicator_object.html) is an extension that generates synthetic data for model training without requiring any changes to the code. It can be used for various tasks, such as retail object detection and robotics. The extension takes a YAML description file as input, which describes a mutable scene or a hierarchy of stacked description files. It then outputs a description file along with graphics content, including RGB images, 2D/3D bounding boxes, segmentation masks, and more.

<img align="center" src="https://docs.omniverse.nvidia.com/isaacsim/latest/_images/overview.png" width="540">



## Learning Objectives

In this notebook, you will learn how to generate the synthetic dataset for DINO TAO training:

* Setup a configuration file for generating synthetic dataset in Omniverse.
* Generate the synthetic training data based on the configuration file.
* Visualize the generated images along with their annotations.
* Train an object detection model using freshly generated synthetic data with TAO

At the end of this notebook, you will have generated a synthetic training set and an object detection model which you may deploy via [TAO-Deploy](https://github.com/NVIDIA/tao_deploy).

## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the generated synthetic dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/synthetic_dino/results`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.

The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher. <br>

`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json`  file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case so these directories are correctly visible to the docker container.


In [None]:
import os

# Please define this local project directory that needs to be mapped to the TAO docker session.
%env LOCAL_PROJECT_DIR=/path/to/local/tao-experiments

os.environ["HOST_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data", "synthetic_dino")
os.environ["HOST_RESULTS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "synthetic_dino", "results")

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=~/tao-samples/synthetic_dino

# The sample spec files are present in the same path as the downloaded samples.
os.environ["HOST_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)

In [None]:
! mkdir -p $HOST_DATA_DIR
! mkdir -p $HOST_SPECS_DIR
! mkdir -p $HOST_RESULTS_DIR

In [None]:
# Mapping up the local directories to the TAO docker.
import json
import os
mounts_file = os.path.expanduser("~/.tao_mounts.json")
tao_configs = {
   "Mounts":[
         # Mapping the Local project directory
        {
            "source": os.environ["LOCAL_PROJECT_DIR"],
            "destination": "/workspace/tao-experiments"
        },
       {
           "source": os.environ["HOST_DATA_DIR"],
           "destination": "/data"
       },
       {
           "source": os.environ["HOST_SPECS_DIR"],
           "destination": "/specs"
       },
       {
           "source": os.environ["HOST_RESULTS_DIR"],
           "destination": "/results"
       }
   ],
   "DockerOptions": {
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
         },
        "user": "{}:{}".format(os.getuid(), os.getgid()),
        "network": "host"
   }
}
# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(tao_configs, mfile, indent=4)


In [None]:
!cat ~/.tao_mounts.json

## 1. Installing the TAO launcher <a class="anchor" id="head-1"></a>
The TAO launcher is a python package distributed as a python wheel listed in the `nvidia-pyindex` python index. You may install the launcher by executing the following cell.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running

```sh
export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x
```
where x >= 6 and <= 8

We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:
* python >=3.7, <=3.10.x
* docker-ce > 19.03.5
* docker-API 1.40
* nvidia-container-toolkit > 1.3.0-1
* nvidia-container-runtime > 3.4.0-1
* nvidia-docker2 > 2.5.0-1
* nvidia-driver > 455+

Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below

```sh
docker login nvcr.io
```

You will be triggered to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python >=3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the virtualenv and virtualenvwrapper packages.

In [None]:
# SKIP this step IF you have already installed the TAO launcher.
!pip3 install nvidia-pyindex
!pip3 install nvidia-tao

In [None]:
# View the versions of the TAO launcher
!tao info --verbose

## 2. Generate synthetic dataset and verify the annotations <a class="anchor" id="head-2"></a>
### 2.1 Setup the configuration file for the Omniverse Replicator
In this section, we will use the "pallet" dataset for the tutorial. The following script will automatically generate the synthetic "pallet" dataset:

- Please note that all the USD contents are sourced from `omniverse://content.ov.nvidia.com/`.
- Make sure you have the necessary access before running the code.
- More details for the hyper-parameters can be found in the [omni.replicator.object](https://docs.omniverse.nvidia.com/isaacsim/latest/replicator_tutorials/tutorial_replicator_object.html) page.


In [None]:
config_file="""
omni.replicator.object:
  version: 0.2.16
  num_frames: 20
  seed: 100
  inter_frame_time: 1
  gravity: 10000
  position_H:
    harmonizer_type: mutable_attribute
    mutable_attribute:
      distribution_type: range
      start:
      - -94.77713317047056
      - 0
      - -35.661244451558446
      end:
      - -94.77713317047056
      - 0
      - -35.661244451558446
  screen_height: 720
  focal_length: 14.228393962367306
  output_path: /tmpsrc/results
  horizontal_aperture: 20.955
  screen_width: 1080
  camera_parameters:
    far_clip: 100000
    focal_length: $(focal_length)
    horizontal_aperture: $(horizontal_aperture)
    near_clip: 0.1
    screen_height: $(screen_height)
    screen_width: $(screen_width)
  default_camera:
    count: 1
    camera_parameters: $(camera_parameters)
    transform_operators:
    - translate_global:
        distribution_type: harmonized
        harmonizer_name: position_H
    - rotateY: $[seed]*20
    - rotateX:
        distribution_type: range
        start: -15
        end: -25
    - translate:
        distribution_type: range
        start:
        - -40
        - -30
        - 400
        end:
        - 40
        - 30
        - 550
    type: camera
  distant_light:
    color:
      distribution_type: range
      end:
      - 1.3
      - 1.3
      - 1.3
      start:
      - 0.7
      - 0.7
      - 0.7
    count: 5
    intensity:
      distribution_type: range
      end: 600
      start: 150
    subtype: distant
    transform_operators:
    - rotateY:
        distribution_type: range
        end: 180
        start: -180
    - rotateX:
        distribution_type: range
        end: -10
        start: -40
    type: light
  dome_light:
    type: light
    subtype: dome
    color:
      distribution_type: range
      start:
      - 0.7
      - 0.7
      - 0.7
      end:
      - 1.3
      - 1.3
      - 1.3
    intensity:
      distribution_type: range
      start: 1000
      end: 3000
    transform_operators:
    - rotateX: 270
  plane:
    physics: collision
    type: geometry
    subtype: plane
    tracked: false
    transform_operators:
    - scale:
      - 5
      - 5
      - 5
  rotY_H:
    harmonizer_type: mutable_attribute
    mutable_attribute:
      distribution_type: range
      start: 0
      end: 0
  translate_H:
    harmonizer_type: mutable_attribute
    mutable_attribute:
      distribution_type: range
      start:
      - 0
      - 60
      - 0
      end:
      - 0
      - 30
      - 0

  box:
    count: 3
    physics: rigidbody
    type: geometry
    subtype: mesh
    tracked: true
    transform_operators:
    - translate_global:
        distribution_type: harmonized
        harmonizer_name: position_H
    - translate_pallet:
        distribution_type: harmonized
        harmonizer_name: translate_H
    - rotateY:
        distribution_type: harmonized
        harmonizer_name: rotY_H
    - translate:
      - 120 * ($[index])
      - 20
      - 0
    - rotateXYZ:
      - 0
      - -90
      - -90
    - scale:
      - 12
      - 10
      - 6
    usd_path:
      distribution_type: set
      values:
      - omniverse://content.ov.nvidia.com/NVIDIA/Assets/DigitalTwin/Assets/Warehouse/Shipping/Cardboard_Boxes/White_A/WhiteCorrugatedBox_A01_10x10x10cm_PR_NVD_01.usd
      - omniverse://content.ov.nvidia.com/NVIDIA/Assets/DigitalTwin/Assets/Warehouse/Shipping/Cardboard_Boxes/Cube_A/CubeBox_A01_10cm_PR_NVD_01.usd
  warehouse:
    type: geometry
    subtype: mesh
    usd_path: omniverse://content.ov.nvidia.com/NVIDIA/Assets/Isaac/2023.1.1/Isaac/Environments/Simple_Warehouse/warehouse_with_forklifts.usd
    transform_operators:
    - translate:
      - -200
      - 0.1
      - 0
    - rotateXYZ:
      - 0
      - -90
      - -90
    - scale:
      - 100
      - 100
      - 100

  output_switches:
    images: True
    labels: True
    descriptions: False
    3d_labels: False
    segmentation: False
"""

In [None]:
# Define the configuration file and save to the local.
import yaml
yaml_file = yaml.safe_load(config_file)
with open(os.path.join(os.getenv("HOST_DATA_DIR", os.getcwd()), 'config.yaml'), 'w') as outfile:
    yaml.dump(yaml_file, outfile, default_flow_style=False)

* Note that you could simple adjust the hyper-parameter from config file to generate the synthetic data on your use case.
* **If you are using content from omniver nucleus servers you need to set your own OMNI_USER and OMNI_PASS variables.**

In [None]:
# Define the Omniverse password
%env OMNI_PASS=YOUR_OWN_OMNI_PASSWORD

### 2.2 Launch the synthetic data generation
Launch the synthetic data generation with the Omniverse replicator object extension inside the container.

The following example is one of the synthetic data generation scenes. You can modify the config file to generate different scenes with various objects, backgrounds, and numbers of target objects.
* Note that the current synthetic data generation pipeline only supports single GPU. 

In [None]:
!docker run --gpus device=0 -it \
    --entrypoint /bin/bash \
    --network host \
    -v $HOST_DATA_DIR:/tmpsrc \
    -e OMNI_USER='$omni-api-token' \
    -e OMNI_PASS=$OMNI_PASS \
    nvcr.io/nvidia/isaac-sim:4.0.0 \
    -c "apt-get update && apt-get install libglib2.0-dev -y && bash isaac-sim.sh --no-window --allow-root --/windowless=True --allow-root --/log/outputStreamLevel=fatal --/app/extensions/fastImporter/enabled=false --enable omni.replicator.object --/config/file=/tmpsrc/config.yaml --/binary=True"

### 2.3 Visualize the generated data
In this section, we run the `visualization` tool to visualize the generated synthetic data. The generation tool produces the synthetic images and the corresponding json files that contain the training annotation information.

After generating the synthetic data, it stores in the "HOST_RESULTS_DIR". It can be visualized by following commands. 

In [None]:
# Install the matplotlib dependencies.
!pip3 install "matplotlib>=3.3.3, <4.0"
!pip3 install opencv-python==4.8.0.74
!pip3 install numpy==1.24.4

In [None]:
# Simple grid visualizer
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg']

def visualize_images(output_path, num_cols=4, num_images=10):
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[40,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 

In [None]:
# Visualizing the sample images.
# Note that the sample spec is not meant to produce SOTA (state-of-the-art) accuracy on Objectron dataset.
IMAGE_DIR = os.path.join(os.environ['HOST_DATA_DIR'], 'results', 'images')
COLS = 2 # number of columns in the visualizer grid.
IMAGES = 4 # number of images to visualize.

visualize_images(IMAGE_DIR, num_cols=COLS, num_images=IMAGES)

### 2.4 Visualize and Verify the Training Data Annotation 
In this section, we visualize the annotation for the generated synthetic data by using the related json file. 

In [None]:
# Simple grid visualizer
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import os
import cv2
from math import ceil
valid_image_ext = ['.jpg']

def visualize_images_with_annotations(images_path, labels_path, num_cols=4, num_images=10):
    
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[40,30])
    f.tight_layout()
    
    images=[]
    labels=[]
    for label_file in os.listdir(labels_path):
        labels.append(os.path.join(labels_path, label_file))
        images.append(os.path.join(images_path, label_file.replace('.txt', '.jpg')))
        
    for idx, img_path in enumerate(images[:num_images]):
        img = plt.imread(img_path)
        lf = labels[idx]
        with open(labels[idx], 'r') as fp:
            for line in fp:
                sp = line.split(' ')
                xmin, ymin, xmax, ymax = int(sp[4]), int(sp[5]), int(sp[6]), int(sp[7])
                cv2.rectangle(img, (xmin,ymin), (xmax,ymax), (0,255,0), 2)
        col_id = idx % num_cols
        row_id = idx // num_cols
        axarr[row_id, col_id].imshow(img) 

In [None]:
# Visualizing the sample images along with their annotations
IMAGE_DIR = os.path.join(os.environ['HOST_DATA_DIR'], 'results', 'images')
LABELS_DIR = os.path.join(os.environ['HOST_DATA_DIR'], 'results', 'labels')
COLS = 2 # number of columns in the visualizer grid.
IMAGES = 4 # number of images to visualize.

visualize_images_with_annotations(IMAGE_DIR, LABELS_DIR, num_cols=COLS, num_images=IMAGES)

## 3. Launch the DINO TAO object detection training using the generated synthetic dataset
In this section, it will introduce how to use the synthetic dataset to launch the training. 
More details regarding to the hyper-parameters and the end-to-end training pipeline could be found in the [DINO Notebook](https://github.com/NVIDIA/tao_tutorials/blob/main/notebooks/tao_launcher_starter_kit/dino/dino.ipynb).

### 3.1 Download the pre-trained model
We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [None]:
# Installing NGC CLI on the local machine.
## Download and install
import os
import platform

if platform.machine() == "x86_64":
    os.environ["CLI"]="ngccli_linux.zip"
else:
    os.environ["CLI"]="ngccli_arm64.zip"


# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))

In [None]:
# Pull pretrained model from NGC
!mkdir -p $HOST_RESULTS_DIR/pretrained_models
!ngc registry model download-version nvidia/tao/retail_object_detection:trainable_binary_v2.1.1 --dest $HOST_RESULTS_DIR/pretrained_models
print("Check if model is downloaded into dir.")
!ls -l $HOST_RESULTS_DIR/pretrained_models/retail_object_detection_vtrainable_binary_v2.1.1/

### 3.2 Provide training specification <a class="anchor" id="head-3"></a>

We provide specification files to configure the training parameters including:

* dataset: configure the dataset and augmentation methods
    * train_data_sources:
        * image_dir: the root directory for train images
        * json_file: annotation file for train data. Must be in COCO JSON format.
    * val_data_sources: 
        * image_dir: the root directory for validation images
        * json_file: annotation file for validation data. Must be in COCO JSON format.
    * num_classes: number of classes of your training data
    * batch_size: batch size for dataloader
    * workers: number of workers to do data loading
* model: configure the model setting
    * pretrained_backbone_path: path to the pretrained backbone model. ResNet50, FAN-variants, and GCViT-variants are supported
    * num_feature_levels: number of feature levels used from backbone
    * dec_layers: number of decoder layers
    * enc_layers: number of encoder layers
    * num_queries: number of queries for the model
    * num_select: number of top-k proposals to select from
    * use_dn: flag to enable denoising during training
    * dropout_ratio: drop out ratio
* train: configure the training hyperparameters
    * pretrained_model_path: load pretrained model path before train
    * freeze: freezes listed modules dutraining train
    * num_gpus: number of gpus 
    * num_nodes: number of nodes (num_nodes=1 for single node)
    * val_interval: validation interval
    * optim:
        * lr_backbone: learning rate for backbone
        * lr: learning rate for the rest of the model
        * lr_steps: learning rate decay step milestone (MultiStep)
    * num_epochs: number of epochs
    * activation_checkpoint: recompute activations in the backward to save GPU memory. Default is `True`.
    * precision: If set to fp16, the training is run on Automatic Mixed Precision (AMP)
    * distributed_strategy: Default is `ddp`. `ddp_sharded` is also supported.

See the [TAO documentation - DINO](https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/dino.html) to get all the parameters that are configurable.


In [None]:
!cat $HOST_SPECS_DIR/train_synthetic.yaml

### 3.3 Run object detection TAO training <a class="anchor" id="head-3"></a>
* Provide the sample spec file and the output directory location for models
* Evaluation uses COCO metrics. For more info, please refer to: https://cocodataset.org/#detection-eval

* To speed up training, try setting `train.precision=fp16` for mixed precision training

In [None]:
# NOTE: The following paths are set from the perspective of the TAO Docker.
# The data is saved here
%env DATA_DIR = /data
%env SPECS_DIR = /specs
%env RESULTS_DIR = /results
%env MODEL_DIR = /model

In [None]:
print("For multi-GPU, change num_gpus in train.yaml based on your machine.")
print("For multi-node, change num_gpus and num_nodes in train.yaml based on your machine.")
# If you face out of memory issue, you may reduce the batch size in the spec file by passing dataset.batch_size=2
!tao model dino train \
          -e $SPECS_DIR/train_synthetic.yaml \
          results_dir=$RESULTS_DIR/

In [None]:
print('Trained checkpoints:')
print('---------------------')
!ls -ltrh $HOST_RESULTS_DIR/train

In [None]:
# You can set NUM_EPOCH to the epoch corresponding to any saved checkpoint
# %env NUM_EPOCH=029

# Get the name of the checkpoint corresponding to your set epoch
# tmp=!ls $HOST_RESULTS_DIR/train/*.pth | grep epoch_$NUM_EPOCH
# %env CHECKPOINT={tmp[0]}

# Or get the latest checkpoint
os.environ["CHECKPOINT"] = os.path.join(os.getenv("HOST_RESULTS_DIR"), "train/dino_model_latest.pth")

print('Rename a trained model: ')
print('---------------------')
!cp $CHECKPOINT $HOST_RESULTS_DIR/train/retail_object_detection.pth
!ls -ltrh $HOST_RESULTS_DIR/train/retail_object_detection.pth

This notebook has come to an end.
More details of end-to-end training and inference pipeline could be found in the [DINO Notebook](hhttps://github.com/NVIDIA/tao_tutorials/blob/main/notebooks/tao_launcher_starter_kit/dino/dino.ipynb) and [Retail Object Detection Notebook](https://github.com/NVIDIA/tao_tutorials/blob/main/notebooks/tao_launcher_starter_kit/retail_object_detection/retail_object_detection.ipynb).

Note that the model we trained above is not a competitive object detection model. We have just trained few epochs with very few data without any pre-trained weights. Our goal was to show how to generate synthetic data using the extension and train an object detection model using those images. The training shows that the training error is decreasing so we basically achieved our goal.