#  Synthetic Data Generation and Training Workflow with Warehouse Sim Ready Assets

This notebook is the second part of the SDG and Training Workflow. Here, we will be focusing on training an Object Detection Network with TAO toolkit

A high level overview of the steps:
* Pulling TAO Docker Container
* Training Detectnet_v2 model with generated Synthetic Data 
* Visualizing Model Performance on Sample Real World Data

`This notebook is very similar to the cloud training notebook, only mounted directories and paths for the docker containers are changed. The data, model and training, evaluation and inference steps are identical` 

#### If Isaac Sim is installed locally, ensure that data generation is complete. Run the `generate_data.sh` script in this folder. Ensure the path to Isaac Sim is set correctly in the script (`ISAAC_SIM_PATH` corresponds to where Isaac Sim is installed locally on your workstation)

### Table of Contents

This notebook shows an example usecase of Object Detection using DetectNet_v2 in the Train Adapt Optimize (TAO) Toolkit. We will train the model with Synthetic Data generated previously.

1. [Set up TAO via Docker container](#head-1)
2. [Download Pretrained model](#head-2)
3. [Convert Dataset to TFRecords for TAO](#head-3)
4. [Provide training specification](#head-4)
5. [Run TAO training](#head-5)
6. [Evaluate trained model](#head-6)
7. [Visualize Model Predictions on Real World Data](#head-7)
8. [Next Steps](#head-8)

## 1. Set up TAO via Docker Container <a class="anchor" id="head-1"></a>

* We will follow the pre-requisites section of [instructions](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_quick_start_guide.html#running-tao-toolkit) for using TAO toolkit. Make sure that the pre-requisite steps are completed (installing `docker`, `nvidia container toolkit` and `docker login nvcr.io`)

* The docker container being used for training will be pulled in the cells below, make sure you have completed the pre-requisite steps and `docker login nvcr.io` to allow pulling of the container from NGC


In [10]:
import os
%env DOCKER_REGISTRY=nvcr.io
%env DOCKER_NAME=nvidia/tao/tao-toolkit
%env DOCKER_TAG=4.0.0-tf1.15.5 ## for TensorFlow docker

%env DOCKER_CONTAINER=nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5

env: DOCKER_REGISTRY=nvcr.io
env: DOCKER_NAME=nvidia/tao/tao-toolkit
env: DOCKER_TAG=4.0.0-tf1.15.5 ## for TensorFlow docker
env: DOCKER_CONTAINER=nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5


## 2. Download Pretrained Model <a class="anchor" id="head-2"></a>

* We will use the `detectnet_v2` Object Detection model with a `resnet18` backbone
* Make sure the `LOCAL_PROJECT_DIR` environment variable has the path of this cloned repository in the cell below


In [11]:
# os.environ["LOCAL_PROJECT_DIR"] = "<LOCAL_PATH_OF_CLONED_REPO>"
os.environ["LOCAL_PROJECT_DIR"] = os.path.dirname(os.getcwd()) # This is the location of the root of the cloned repo
print(os.environ["LOCAL_PROJECT_DIR"])

/home/shinfang-ovx/synthetic_data_generation_training_workflow


In [12]:
!wget --quiet --show-progress --progress=bar:force:noscroll --auth-no-challenge --no-check-certificate \
        https://api.ngc.nvidia.com/v2/models/nvidia/tao/pretrained_detectnet_v2/versions/resnet18/files/resnet18.hdf5 \
        -P  $LOCAL_PROJECT_DIR/local/training/tao/pretrained_model/



## 3. Convert Dataset to TFRecords for TAO <a class="anchor" id="head-3"></a>

* The `Detectnet_v2` model in TAO expects data in the form of TFRecords for training. 
* We can convert the KITTI Format Dataset generated from Part 1 with the `detectnet_v2 dataset_convert` tool provided with TAO toolkit


In [13]:
print("Converting Tfrecords for kitchenware distractors dataset")

!mkdir -p $LOCAL_PROJECT_DIR/local/training/tao/tfrecords/distractors_kitchen && rm -rf $LOCAL_PROJECT_DIR/local/training/tao/tfrecords/distractors_warehouse/*

!docker run -it --rm --gpus all -v $LOCAL_PROJECT_DIR:/workspace/tao-experiments $DOCKER_CONTAINER \
                   detectnet_v2 dataset_convert \
                  -d /workspace/tao-experiments/local/training/tao/specs/tfrecords/distractors_kitchen.txt \
                  -o /workspace/tao-experiments/local/training/tao/tfrecords/distractors_kitchen/

Converting Tfrecords for kitchenware distractors dataset

=== TAO Toolkit TensorFlow ===

NVIDIA Release 4.0.0-TensorFlow (build )
TAO Toolkit Version 4.0.0

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the TAO Toolkit End User License Agreement.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/tao-toolkit-software-license-agreement

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for TAO Toolkit.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

Using TensorFlow backend.
2024-10-28 10:28:28.684320: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Using TensorFlow backend.
2024-10-28 10:28:32,367 [INFO] iva.detectnet_v

In [14]:
print("Converting Tfrecords for kitchenware with additional distractors")

!mkdir -p $LOCAL_PROJECT_DIR/local/training/tao/tfrecords/distractors_additional && rm -rf $LOCAL_PROJECT_DIR/local/training/tao/tfrecords/distractors_additional/*

!docker run -it --rm --gpus all -v $LOCAL_PROJECT_DIR:/workspace/tao-experiments $DOCKER_CONTAINER \
                   detectnet_v2 dataset_convert \
                  -d /workspace/tao-experiments/local/training/tao/specs/tfrecords/distractors_additional.txt \
                  -o /workspace/tao-experiments/local/training/tao/tfrecords/distractors_additional/

Converting Tfrecords for kitchenware with additional distractors

=== TAO Toolkit TensorFlow ===

NVIDIA Release 4.0.0-TensorFlow (build )
TAO Toolkit Version 4.0.0

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the TAO Toolkit End User License Agreement.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/tao-toolkit-software-license-agreement

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for TAO Toolkit.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

Using TensorFlow backend.
2024-10-28 10:28:36.576281: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Using TensorFlow backend.
2024-10-28 10:28:40,193 [INFO] iva.det

In [15]:
print("Converting Tfrecords for kitti trainval dataset")
# !mkdir -p $LOCAL_DATA_DIR/tfrecords/july/distractors_palletjack_warehouse && rm -rf $LOCAL_DATA_DIR/tfrecords/july/distractors_palletjack_warehouse/*
!mkdir -p $LOCAL_PROJECT_DIR/local/training/tao/tfrecords/no_distractors && rm -rf $LOCAL_PROJECT_DIR/local/training/tao/tfrecords/no_distractors/*

!docker run -it --rm --gpus all -v $LOCAL_PROJECT_DIR:/workspace/tao-experiments $DOCKER_CONTAINER \
                   detectnet_v2 dataset_convert \
                  -d /workspace/tao-experiments/local/training/tao/specs/tfrecords/no_distractors.txt \
                  -o /workspace/tao-experiments/local/training/tao/tfrecords/no_distractors/

Converting Tfrecords for kitti trainval dataset

=== TAO Toolkit TensorFlow ===

NVIDIA Release 4.0.0-TensorFlow (build )
TAO Toolkit Version 4.0.0

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the TAO Toolkit End User License Agreement.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/tao-toolkit-software-license-agreement

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for TAO Toolkit.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

Using TensorFlow backend.
2024-10-28 10:28:44.389921: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Using TensorFlow backend.
2024-10-28 10:28:47,997 [INFO] iva.detectnet_v2.dataio.

## 4. Provide Training Specification File <a class="anchor" id="head-4"></a>

* The spec file for training with TAO is provided under `$LOCAL_PROJECT_DIR/specs/training/resnet18_distractors.txt`
* The tfrecords and the synthetic data generated in the previous steps are provided under the `dataset_config` parameter of the file
* Other parameters like `augmentation_config`, `model_config`, `postprocessing_config` can be adjusted. Refer to [this](https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/detectnet_v2.html) for a detailed guideline on adjusting the parameters in the spec file
* For training our model to detect `kitchenware` this `spec` file provided can be used directly


In [16]:
!cat $LOCAL_PROJECT_DIR/local/training/tao/specs/training/resnet18_distractors.txt

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tao-experiments/local/training/tao/tfrecords/distractors_kitchen/*"
    image_directory_path: "/workspace/tao-experiments/kitchenware_sdg/kitchenware_data/distractors_kitchen/Camera"
  }
  
  data_sources {
    tfrecords_path: "/workspace/tao-experiments/local/training/tao/tfrecords/distractors_additional/*"
    image_directory_path: "/workspace/tao-experiments/kitchenware_sdg/kitchenware_data/distractors_additional/Camera"
  }
  
  data_sources {
    tfrecords_path: "/workspace/tao-experiments/local/training/tao/tfrecords/no_distractors/*"
    image_directory_path: "/workspace/tao-experiments/kitchenware_sdg/kitchenware_data/no_distractors/Camera"
  }
  
  image_extension: "png"
  
  target_class_mapping {
    key: "bottle"
    value: "bottle"
  }
  
  target_class_mapping {
    key: "ladle"
    value: "ladle"
  }
  
  target_class_mapping {
    key: "wok"
    value: "wok"
  }
  
  target_class_mapping {

### Hyperparameters can be set in the `spec` file. Adjust batch size parameter depending on the VRAM of your GPU 

* You can increase the number of epochs, the number of false positives in real world images keeps decreasing (mAP does not change much after ~250 epochs and usually results in the best trained model for the given dataset)

## 5. Run TAO Training <a class="anchor" id="head-5"></a>

* The `$LOCAL_PROJECT_DIR` will be mounted to the TAO docker for training, this contains all the data, pretrained model and spec files (training and inference) needed

#### Ensure that no `_warning.json` file exists in the `$LOCAL_PROJECT_DIR/cloud/training/tao/tfrecords` sub-folders (`distractors_additional`, `ditractors_warehouse` and `no_distractors`)
* Delete the `_warning.json` files before beginning training
* TAO training won't begin if the structure of the `tfrecords` folder directories is not as expected 

In [17]:
# Setting up env variables for cleaner command line commands.
%env KEY=tlt_encode
%env NUM_GPUS=1

env: KEY=tlt_encode
env: NUM_GPUS=1


* TAO Training can be stopped and resumed (`checkpoint_interval` parameter specified in the `spec` file)
* Tensorboard visualization can be used with TAO [instructions](https://docs.nvidia.com/tao/tao-toolkit/text/tensorboard_visualization.html#visualizing-using-tensorboard). 
* The `$RESULTS_DIR` parameter is the folder where the `$LOCAL_PROJECT_DIR/local/training/tao/detectnet_v2/resnet18_kitchenware` folder which is specified with the `-i` flag in the command below

In [18]:
!docker run -it --rm --gpus all -v $LOCAL_PROJECT_DIR:/workspace/tao-experiments $DOCKER_CONTAINER \
            detectnet_v2 train -e /workspace/tao-experiments/local/training/tao/specs/training/resnet18_distractors.txt \
            -r /workspace/tao-experiments/local/training/tao/detectnet_v2/resnet18_kitchenware -k $KEY --gpus $NUM_GPUS


=== TAO Toolkit TensorFlow ===

NVIDIA Release 4.0.0-TensorFlow (build )
TAO Toolkit Version 4.0.0

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the TAO Toolkit End User License Agreement.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/tao-toolkit-software-license-agreement

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for TAO Toolkit.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

Using TensorFlow backend.
2024-10-28 10:44:18.320754: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Using TensorFlow backend.
2024-10-28 10:44:22,637 [INFO] iva.common.logging.logging: Log file already exists at /workspace/tao-ex

## 6. Evaluate Trained Model <a class="anchor" id="head-6"></a>

* While generating the `tfrecords` part of the total data generated was kept as a validation set (14% of total data)
* We will run our model evaluation on this data to obtain metrics

In [19]:
!docker run -it --rm --gpus all -v $LOCAL_PROJECT_DIR:/workspace/tao-experiments $DOCKER_CONTAINER \
            detectnet_v2 evaluate -e /workspace/tao-experiments/local/training/tao/specs/training/resnet18_distractors.txt \
            -m /workspace/tao-experiments/local/training/tao/detectnet_v2/resnet18_kitchenware/weights/model.tlt \
            -k $KEY --gpus $NUM_GPUS


=== TAO Toolkit TensorFlow ===

NVIDIA Release 4.0.0-TensorFlow (build )
TAO Toolkit Version 4.0.0

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the TAO Toolkit End User License Agreement.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/tao-toolkit-software-license-agreement

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for TAO Toolkit.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

Using TensorFlow backend.
2024-10-29 02:10:42.561616: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Using TensorFlow backend.
2024-10-29 02:10:46,284 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /w

## 7. Visualize Model Performance on Real World Data <a class="anchor" id="head-7"></a>

* Lets visualize the model predictions on a few sample real world images next
* We will use kichenware images in a warehouse from the `LOCO` dataset to understand if the model is capable of performing real world detections
* Additional images can be placed under the `loco_kitchenware` folder of this project. The input folder is specified with the `-i` flag in the command below 

In [22]:
!docker run -it --rm --gpus all -v $LOCAL_PROJECT_DIR:/workspace/tao-experiments $DOCKER_CONTAINER \
                            detectnet_v2 inference -e /workspace/tao-experiments/local/training/tao/specs/inference/new_inference_specs.txt \
                            -o /workspace/tao-experiments/local/training/tao/detectnet_v2/resnet18_kitchenware/1200_model_synthetic \
                            -i /workspace/tao-experiments/images/sample_synthetic \
                            -k $KEY


=== TAO Toolkit TensorFlow ===

NVIDIA Release 4.0.0-TensorFlow (build )
TAO Toolkit Version 4.0.0

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the TAO Toolkit End User License Agreement.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/tao-toolkit-software-license-agreement

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for TAO Toolkit.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

Using TensorFlow backend.
2024-10-29 03:51:18.991548: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Using TensorFlow backend.
INFO: Merging specification from /workspace/tao-experiments/local/training/tao/specs/inference/new_infe

In [23]:
from IPython.display import Image 

results_dir = os.path.join(os.environ["LOCAL_PROJECT_DIR"], "local/training/tao/detectnet_v2/resnet18_kitchenware/1200_model_synthetic/test_loco/images_annotated")
# pil_img = Image(filename=os.path.join(os.getenv("LOCAL_PROJECT_DIR"), 'detecnet_v2/july_resnet18_trials/new_pellet_distractors_10k/test_loco/images_annotated/1564562568.298206.jpg'))
                           
image_names = ["1564562568.298206.jpg", "1564562628.517229.jpg", "1564562843.0618184.jpg", "593768,3659.jpg", "516447400,977.jpg"] 
                           
images = [Image(filename = os.path.join(results_dir, image_name)) for image_name in image_names]

display(*images)

FileNotFoundError: [Errno 2] No such file or directory: '/home/shinfang-ovx/synthetic_data_generation_training_workflow/local/training/tao/detectnet_v2/resnet18_kitchenware/1200_model_synthetic/test_loco/images_annotated/1564562568.298206.jpg'

## 8. Next Steps <a class="anchor" id="head-8"></a>

#### Generating Synthetic Data for your use case:

* Make changes in the Domain Randomization under the Synthetic Data Generation script (`kitchenware_sdg/standalone_kitchenware_sdg.py`
* Add additional objects of interest in the scene (similar to how `kitchenware` are added, you can add `forklifts`, `ladders` etc.) to generate data
* Use [different](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_quick_start_guide.html#downloading-the-models) models for training with TAO (for object detection, you can use `YOLO`, `SSD`, `EfficientDet`) 
* Replicator provides Semantic Segmentation, Instance Segmentation, Depth and various other ground truth annotations along with RGB. You can also write your own ground truth annotator (eg: Pose Estimation: Refer to [sample](https://docs.omniverse.nvidia.com/isaacsim/latest/tutorial_replicator_offline_pose_estimation.html). These can be used for training a model of your own framework and choice
* Exploring the option of using `Synthetic + Real` data for training a network. Can be particularly useful for generating more data around particular corner cases


#### Deploying Trained Models:

* After obtaining satisfactory results with the training process, you can further optimize your model for deployment with the help of Pruning and QAT.
* TAO models can directly be deployed on Jetson with Isaac ROS or Deepstream which ensures your end-to-end pipeline being optimized (data acquisition -> model inference -> results)