# 3D Multi-Camera Detection & Tracking using TAO Sparse4D

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png" width="1080">

## What is Sparse4D?

[Sparse4D](https://arxiv.org/pdf/2311.11722) is a state of the art Multi-Camera Multi-Target 3D (MTMC-3D) detection & tracking model. We adpot & modify the Sparse4D from the AV usecase to an indoor environment usecase for larger spaces such as warehouses, retail stores, and hospitals which usually consist of static cameras. 

Our pretrained models support classes such as Person, Humanoids, Boxes, Pallets, Crates & Autonomous Mobile Robots (AMRs). We utlize the Resnet-101 backbone for our training.

### Sample prediction of Sparse4D model
<img width="800" align="center" src="https://raw.githubusercontent.com/vpraveen-nv/model_card_images/main/cv/purpose_built_models/sparse4d/sample_output.gif">


The above output shows a warehouse environment recorded by four temporally synchronized cameras. The scene includes a variety of real-world warehouse assets such as Person, Humanoids, Boxes, Pallets, Crates, and Autonomous Mobile Robots (AMRs). Our model accurately detects each object, fits a 3D bounding box around it, and assigns a unique ID per object class across all cameras.


## Learning Objectives

In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Take a pretrained model and train an Sparse4D model on MTMC Tracking 2025 dataset
* Evaluate the trained model
* Run inference with the trained model and visualize the result
* Export the trained model to a .onnx file for deployment to DeepStream

At the end of this notebook, you will have generated a trained `Sparse4D` model
which you may deploy via [DeepStream](https://developer.nvidia.com/deepstream-sdk).

## Table of Contents

This notebook shows an example usecase of Sparse4D using Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables and map drives](#head-0)
1. [Installing the TAO launcher](#head-1)
2. [Prepare dataset and pre-trained model](#head-2)
3. [Provide training specification](#head-3)
4. [Run TAO training](#head-4)
5. [Evaluate a trained model](#head-5)
6. [Visualize inferences](#head-6)

## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/sparse4d/results`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.

The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher. <br>

`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json`  file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case so these directories are correctly visible to the docker container.


In [None]:
import os

# Please define this local project directory that needs to be mapped to the TAO docker session.
%env LOCAL_PROJECT_DIR=./

os.environ["HOST_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data")
os.environ["HOST_RESULTS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "sparse4d", "results")
os.environ["HOST_MODEL_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data", "sparse4d", "model")

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=~/tao-samples/sparse4d

# The sample spec files are present in the same path as the downloaded samples.
os.environ["HOST_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)

In [None]:
! mkdir -p $HOST_DATA_DIR
! mkdir -p $HOST_SPECS_DIR
! mkdir -p $HOST_RESULTS_DIR
! mkdir -p $HOST_MODEL_DIR 

In [None]:
# Mapping up the local directories to the TAO docker.
import json
import os
mounts_file = os.path.expanduser("~/.tao_mounts.json")
tao_configs = {
   "Mounts":[
         # Mapping the Local project directory
        {
            "source": os.environ["LOCAL_PROJECT_DIR"],
            "destination": "/workspace/tao-experiments"
        },
       {
           "source": os.environ["HOST_DATA_DIR"],
           "destination": "/data"
       },
       {
           "source": os.environ["HOST_MODEL_DIR"],
           "destination": "/model"
       },
       {
           "source": os.environ["HOST_SPECS_DIR"],
           "destination": "/specs"
       },
       {
           "source": os.environ["HOST_RESULTS_DIR"],
           "destination": "/results"
       }
   ],
   "DockerOptions": {
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
         },
        "network": "host"
   }
}
# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(tao_configs, mfile, indent=4)

In [None]:
!cat ~/.tao_mounts.json

In [None]:
!tao info --verbose

In [None]:
# SKIP this step IF you have already installed the TAO launcher.
!pip3 install nvidia-pyindex
!pip3 install nvidia-tao

In [None]:
# View the versions of the TAO launcher
!tao info

## 2. Prepare dataset and pre-trained model <a class="anchor" id="head-2"></a>

### 2.1 Prepare dataset

We will use the `MTMC Tracking 2025` dataset in this finetuning tutorial for trianing, evaluation & inference. This is a synthetically generated dataset from Nvidia Omniverse & belongs in the Nvidia PhysicalAI smart spaces category. You may expand on the dataset size by picking multiple different scenes from the MTMC Tracking 2025 dataset & having unique & disjoint sets for training, validation & testing.
 
The dataset consits of multiple scenes such as warehouse, retail, hospital, etc. In this tutorial, we will utilize the Warehouse scene `Warehouse_014` from the training set which consits of temporally consistent multi-camera videos (mp4), depth maps (.h5), camera calibration defined in OV coordiantes & ground truth consists of 3D bounding boxes & object IDs present for all frames.

In [None]:
# Create local dir
!mkdir -p $HOST_DATA_DIR

# Download the MTMC Tracking 2025 from Hugging Face
!pip install --upgrade huggingface-hub[cli]

# # Run script
import os
from huggingface_hub import snapshot_download

host_data_dir_path = os.getenv("HOST_DATA_DIR")

# Download train dataset
snapshot_download(
    repo_id="nvidia/PhysicalAI-SmartSpaces",
    repo_type="dataset",
    allow_patterns="MTMC_Tracking_2025/train/Warehouse_014/*",
    local_dir=host_data_dir_path
)

In [None]:
# Check if the dataset is downloaded correctly (Folder should have a calibration.json, ground_truth.json, videos & depth_maps directory)
!ls -lh $HOST_DATA_DIR/MTMC_Tracking_2025/train/Warehouse_014

In [None]:
# NOTE: The following paths are set from the perspective of the TAO Docker.

# The data is saved here
%env DATA_DIR = /data
%env MODEL_DIR = /model
%env SPECS_DIR = /specs
%env RESULTS_DIR = /results

We will now convert the dataset from a OV format to a suitable training fromat. The Sparse4D model consumes raw images & pkl files consisting of image paths, calibration & GT for its trianing. The TAO dataset convert tool will help us perform this conversion.

In [None]:
!ls $HOST_DATA_DIR/MTMC_Tracking_2025

In [None]:
# Generate annotation pickle (OVPKL) files for training using TAO DataService
!ls $HOST_SPECS_DIR/convert.yaml
!mkdir -p $HOST_DATA_DIR/anno_pkls

# Create annotation pkl files on all frames. (Conversion process will take ~20 minutes to process a 12 camera scene with 9000 frames each)
!tao dataset annotations convert \
        -e $SPECS_DIR/convert.yaml \
        aicity.root=$DATA_DIR/MTMC_Tracking_2025 \
        aicity.camera_grouping_mode=random \
        results_dir=$DATA_DIR/anno_pkls

# List down all training pkl files.
print("Listing all training annotation pkl files:")
!ls $HOST_DATA_DIR/anno_pkls/

In [None]:
# # Not Required for this dataset: Optionally you may create annotation pkl files for test & validation for your own dataset.
# # Update the aicity.split config to test,val 

# !tao dataset annotations convert \
#         -e $SPECS_DIR/convert.yaml \
#         aicity.root=$DATA_DIR/MTMC_Tracking_2025/ \
#         aicity.split=test \
#         aicity.num_frames=10 \
#         results_dir=$DATA_DIR/anno_pkls

# # List down all validation pkl files.
# print("Listing all validation annotation pkl files:")
# !ls $HOST_DATA_DIR/anno_pkls/test
# !ls $HOST_DATA_DIR/anno_pkls/val

In [None]:
# Not Required for this dataset: If you generated a dataset using Nvidia Issac Sim Replicator tool use the below experiment spec.

# !tao dataset annotations convert \
#         -e $SPECS_DIR/convert.yaml \
#         aicity.root=$DATA_DIR/MTMC_Tracking_2025/ \
#         aicity.rgb_format='h5' \
#         aicity.depth_format='h5'
#         aicity.split="" \
#         results_dir=$DATA_DIR/anno_pkls

# # List down all validation pkl files.
# print("Listing all validation annotation pkl files:")
# !ls $HOST_DATA_DIR/anno_pkls/test

### 2.2 Download pre-trained model

We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [None]:
# Installing NGC CLI on the local machine.
## Download and install
import os
import platform

if platform.machine() == "x86_64":
    os.environ["CLI"]="ngccli_linux.zip"
else:
    os.environ["CLI"]="ngccli_arm64.zip"

# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))

In [None]:
!ngc registry model list nvidia/tao/sparse4d_rn101:*

In [None]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tao/sparse4d_rn101:trainable_v1.0 --dest $HOST_MODEL_DIR

In [None]:
print("Check that model is downloaded into dir.")
!ls -l $HOST_MODEL_DIR/sparse4d_rn101_vtrainable_v1.0

## 3. Provide training specification <a class="anchor" id="head-3"></a>

We provide specification files to configure the training parameters including:

* dataset: configure the dataset and augmentation methods
    * num_frames: number of frames present in 1 scene
    * batch_size: batch size for dataloader
    * num_bev_groups: number of Birds Eye View (BEV) group pkl files used for training
    * num_workers: number of workers
    * classes: list of training classes 
    * data_root: path to the main dataset folder consisting of the raw image folders, calibration & ground truth
    * use_h5_file: flag to identify the dataset folder type
    * train_dataset:
        ann_file: path to the annotation pkl file generated using the TAO dataset convert tool
        sequences_split_num: 100
    * val_dataset:
        ann_file: path to the annotation pkl file generated using the TAO dataset convert tool
    * test_dataset:
        ann_file: path to the annotation pkl file generated using the TAO dataset convert tool
* model: configure the model setting
    * use_temporal_align: enable enhanced anchor matching across multiple frames for better tracking
    * instance_bank:
        anchor: path to the initilaized anchor numpy file
* train: configure the training hyperparameters
    * num_gpus: number of gpus 
    * num_nodes: number of nodes (num_nodes=1 for single node)
    * validation_interval: evaluate the model every n intervals
    * checkpoint_interval: saves a checkpoint every n intervals
    * optim:
        * lr: learning rate for the rest of the model
    * num_epochs: number of epochs
    * pretrained_backbone_path: path to the pretrained model
    * precision: If set to bf16, the training is run on Automatic Mixed Precision (AMP)

Please refer to the TAO documentation about Sparse4D to get all the parameters that are configurable.

In [None]:
!cat $HOST_SPECS_DIR/experiment.yaml

## 4. Run TAO training <a class="anchor" id="head-4"></a>
* Provide the sample spec file and the output directory location for models
* *WARNING*: We train Sparse4D on 1408x512p images, 12 cameras x 5 mins @ 30 FPS which requires significant amount of time, **we highly recommend that you run training with multiple high-end gpus (e.g. H100, A100, etc)**. Please refer to the model card & documentation to get an estimate on the training time.
* Sparse4D model per-epoch training time on a single GPU (the hours may vary depending on the data location, network speed, and etc)
* By default model is running on `train.precision=bf16-mixed` for mixed precision training

In [None]:
print("For multi-GPU, change train.num_gpus in train.yaml based on your machine")
print("For multi-node, change train.num_gpus and num_nodes in train.yaml based on your machine")
# If you face out of memory issue, you may reduce the batch size in the spec file by passing dataset.batch_size=2

!tao model sparse4d train \
        -e $SPECS_DIR/experiment.yaml \
        train.num_gpus=1 \
        train.num_nodes=1 \
        dataset.data_root=$DATA_DIR/MTMC_Tracking_2025/train \
        dataset.train_dataset.ann_file=$DATA_DIR/anno_pkls/train/ \
        dataset.val_dataset.ann_file=$DATA_DIR/anno_pkls/train/Warehouse_014+bev-sensor-random-0_infos_train.pkl \
        model.head.instance_bank.anchor=$DATA_DIR/anno_pkls/anchor_init_kmeans900.npy \
        results_dir=$RESULTS_DIR

In [None]:
print('Trained checkpoints:')
print('---------------------')
!ls -ltrh $HOST_RESULTS_DIR/train

In [None]:
# You can set NUM_EPOCH to the epoch corresponding to any saved checkpoint
# %env NUM_EPOCH=029

# Get the name of the checkpoint corresponding to your set epoch
# tmp=!ls $HOST_RESULTS_DIR/train/*.pth | grep epoch_$NUM_EPOCH
# %env CHECKPOINT={tmp[0]}

# Or get the latest checkpoint
os.environ["CHECKPOINT"] = os.path.join(os.getenv("HOST_RESULTS_DIR"), "train/sparse4d_model_latest.pth")

print('Rename a trained model: ')
print('---------------------')
!sudo cp $CHECKPOINT $HOST_RESULTS_DIR/train/sparse4d_model_finetuned.pth
!ls -ltrh $HOST_RESULTS_DIR/train/sparse4d_model_finetuned.pth

## 5. Evaluate a trained model <a class="anchor" id="head-5"></a>

In this section, we run the `evaluate` tool to evaluate the trained model and produce the mAP metric.

We provide evaluate.yaml specification files to configure the evaluate parameters including:

* model: configure the model setting
    * this config should remain same as your trained model's configuration.
* dataset: configure the dataset and augmentation methods
    * data_root: path to the main dataset folder consisting of the raw image folders, calibration & ground truth
    * test_dataset:
        ann_file: path to the annotation pkl file generated using the TAO dataset convert tool
    * batch_size: evaluation is supported on only 1 GPU.
* evaluate:
    * checkpoint: path to the model checkpoint

* **NOTE: You need to change the model path in evaluate.yaml file based on your setting.**

In [None]:
# Evaluate on TAO model
!tao model sparse4d evaluate \
        -e $SPECS_DIR/experiment.yaml \
        evaluate.checkpoint=$RESULTS_DIR/train/sparse4d_model_finetuned.pth \
        dataset.data_root=$DATA_DIR/MTMC_Tracking_2025/train \
        dataset.test_dataset.ann_file=$DATA_DIR/anno_pkls/train/Warehouse_014+bev-sensor-random-0_infos_train.pkl \
        model.head.instance_bank.anchor=$DATA_DIR/anno_pkls/anchor_init_kmeans900.npy \
        results_dir=$RESULTS_DIR/

## 6. Visualize Inferences <a class="anchor" id="head-6"></a>
In this section, we run the `inference` tool to generate inferences on the trained models and visualize the results. The `inference` tool produces annotated image outputs and 3D bounding box & object id output file in NVSchema.

We provide evaluate.yaml specification files to configure the evaluate parameters including:

* model: configure the model setting
    * this config should remain same as your trained model's configuration
* dataset: configure the dataset and augmentation methods
    * test_dataset:
        ann_file: path to the annotation pkl file generated using the TAO dataset convert tool
    * batch_size: evaluation is supported on only 1 GPU.
* inference:
    * checkpoint: path to the model checkpoint
    * output_nvschema: boolean to save output in NVSchema format
* vis:
  * show: boolean to enable visualization
  * vis_dir: path to store the output visualization results
  * vis_score_threshold: threshold to filter away objects with low confidence for visualization
  * n_images_col: no of images to be set horizontally in the grid visualization 

* **NOTE: You need to change the model path in infer.yaml file based on your setting.**

In [None]:
!tao model sparse4d inference \
        -e $SPECS_DIR/experiment.yaml \
        inference.checkpoint=$RESULTS_DIR/train/sparse4d_model_finetuned.pth \
        dataset.data_root=$DATA_DIR/MTMC_Tracking_2025/train \
        dataset.test_dataset.ann_file=$DATA_DIR/anno_pkls/train/Warehouse_014+bev-sensor-random-0_infos_train.pkl \
        model.head.instance_bank.anchor=$DATA_DIR/anno_pkls/anchor_init_kmeans900.npy \
        visualize.show=True \
        visualize.vis_dir=$RESULTS_DIR \
        results_dir=$RESULTS_DIR/

In [None]:
# Combine images to video via ffmpeg.
!sudo apt install ffmpeg

import os
from IPython.display import Video

visdir = os.path.join(os.environ['HOST_RESULTS_DIR'], 'visual_trk')

!ffmpeg -framerate 24 -pattern_type glob -i '{visdir}/*.jpg' -c:v libx264 -pix_fmt yuv420p '{visdir}/output.mp4'

Video(os.path.join(visdir, "output.mp4"))

## 7. Export model to ONNX <a class="anchor" id="head-7"></a>

In [None]:
# Export the model to ONNX model. The obtained ONNX model file can now be used for Deepstream deployment.
!tao model sparse4d export \
        -e $SPECS_DIR/experiment.yaml \
        export.checkpoint=$RESULTS_DIR/train/sparse4d_model_finetuned.pth \
        export.onnx_file=$RESULTS_DIR/export/sparse4d_model_finetuned.onnx \
        model.head.instance_bank.anchor=$DATA_DIR/anno_pkls/anchor_init_kmeans900.npy \
        results_dir=$RESULTS_DIR/

This notebook has come to an end.