# 3D Object Detection using TAO BEVFusion

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png" width="1080">

## What is BEVFusion?

[BEVFusion](https://arxiv.org/abs/2205.13542) is a state of the art 3D Object Detection model using Camera-LiDAR Fusion. To fuse 3D features from LiDAR sensor with features from 2D image, BEVFusion uses BEV Feature space as commond feature space from both sensors. BEVFusion proposed to use Camera-To-BEV View-Transform to estimate BEV Feature from given 2D image and fuse BEV Features from Camera and LiDAR with ConvFuser module. Lastly, it has 3D Detection head to perfrom 3D object detection using fused feature.

### Sample prediction of TAO BEVFusion model
<img align="center" src="./sample.jpg" width="960">

## Learning Objectives

In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Convert Kitti dataset into TAO BEVFusion compatible input format by parsing pedestrian class only. (KittiPerson Dataset)
* Take a pretrained model and finetune an BEVFusion model on KittiPerson dataset
* Evaluate the trained model
* Run inference with the trained model and visualize the result

At the end of this notebook, you will have generated a trained `bevfusion` model.

## Table of Contents

This notebook shows an example usecase of BEVFusion using Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables and map drives](#head-0)
1. [Installing the TAO launcher](#head-1)
2. [Prepare dataset and pre-trained model](#head-2)
3. [Provide training specification](#head-3)
4. [Run TAO training](#head-4)
5. [Evaluate a trained model](#head-5)
6. [Visualize inferences](#head-6)

## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/bevfusion/results`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.

The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher. <br>

`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json`  file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case so these directories are correctly visible to the docker container.


In [None]:
import os

# Please define this local project directory that needs to be mapped to the TAO docker session.
%env LOCAL_PROJECT_DIR=/path/to/local/tao-experiments

os.environ["HOST_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data", "bevfusion")
os.environ["HOST_RESULTS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "bevfusion", "results")
os.environ["LOCAL_CACHE_DIR"] = os.path.join(os.environ['HOME'], ".cache")

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=~/tao-samples/bevfusion

# The sample spec files are present in the same path as the downloaded samples.|
os.environ["HOST_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)

In [None]:
! mkdir -p $HOST_DATA_DIR
! mkdir -p $HOST_SPECS_DIR
! mkdir -p $HOST_RESULTS_DIR

In [None]:
# Mapping up the local directories to the TAO docker.
import json
import os
mounts_file = os.path.expanduser("~/.tao_mounts.json")
tao_configs = {
   "Mounts":[
         # Mapping the Local project directory
        {
            "source": os.environ["LOCAL_PROJECT_DIR"],
            "destination": "/workspace/tao-experiments"
        },
       {
           "source": os.environ["HOST_DATA_DIR"],
           "destination": "/data"
       },
       {
           "source": os.environ["HOST_SPECS_DIR"],
           "destination": "/specs"
       },
       {
           "source": os.environ["HOST_RESULTS_DIR"],
           "destination": "/results"
       },
       {
           "source": os.environ["LOCAL_CACHE_DIR"],
           "destination": "/.cache"
       },
   ],
   "Envs": [
        {
            "variable": "TAO_TOOLKIT_CACHE",
            "value": "/.cache",
        }
    ],
   "DockerOptions": {
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
         },
        "user": "{}:{}".format(os.getuid(), os.getgid()),
        "network": "host"
   }
}
# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(tao_configs, mfile, indent=4)

In [None]:
!cat ~/.tao_mounts.json

In [None]:
# SKIP this step IF you have already installed the TAO launcher.
!pip3 install nvidia-pyindex
!pip3 install nvidia-tao

## 1. Installing the TAO launcher <a class="anchor" id="head-1"></a>
The TAO launcher is a python package distributed as a python wheel listed in the `nvidia-pyindex` python index. You may install the launcher by executing the following cell.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running

```sh
export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x
```
where x >= 6 and <= 8

We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:
* python >=3.7, <=3.10.x
* docker-ce > 19.03.5
* docker-API 1.40
* nvidia-container-toolkit > 1.3.0-1
* nvidia-container-runtime > 3.4.0-1
* nvidia-docker2 > 2.5.0-1
* nvidia-driver > 455+

Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below

```sh
docker login nvcr.io
```

You will be triggered to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python >=3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the virtualenv and virtualenvwrapper packages.

In [None]:
# View the versions of the TAO launcher
!tao info
!tao model -h

## 2. Prepare dataset and pre-trained model <a class="anchor" id="head-2"></a>

We will be using the KITTI detection dataset for the tutorial. To find more details please visit
http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d. Please download the KITTI detection images (http://www.cvlibs.net/download.php?file=data_object_image_2.zip), labels(http://www.cvlibs.net/download.php?file=data_object_label_2.zip), velodyne LIDAR pointcloud(http://www.cvlibs.net/download.php?file=data_object_velodyne.zip) and LIDAR calibration file(http://www.cvlibs.net/download.php?file=data_object_calib.zip) to $DATA_DOWNLOAD_DIR.
 
The data will then be extracted to have below structure.

```bash
│── ImageSets
│── training
│   ├──calib & velodyne & label_2 & image_2
│── testing
│   ├──calib & velodyne & image_2
```


The `testing` directory will not be used in this notebook as it has no labels. For the `training` dataset, we will convert the labeles into pkl format in following section.

You may use this notebook with your own dataset as well. To use this example with your own dataset, please follow the same directory structure as mentioned below.

### 2.1 Download and Verify dataset

Once you have gotten the download links in your email, please populate them in place of the `KITTI_IMAGES_DOWNLOAD_URL`,  `KITTI_LABELS_DOWNLOAD_URL`, `KITTI_LIDAR_DOWNLOAD_DIR` and `KITTI_CALIB_DOWNLOAD_DIR`. This next cell, will download the data and place in `$HOST_DATA_DIR`

In [None]:
import os
!mkdir -p $HOST_DATA_DIR

os.environ["URL_IMAGES"]="KITTI_IMAGES_DOWNLOAD_URL"
!if [ ! -f $HOST_DATA_DIR/data_object_image_2.zip ]; then wget $URL_IMAGES -O $HOST_DATA_DIR/data_object_image_2.zip; else echo "image archive already downloaded"; fi 

os.environ["URL_LABELS"]="KITTI_LABELS_DOWNLOAD_URL"
!if [ ! -f $HOST_DATA_DIR/data_object_label_2.zip ]; then wget $URL_LABELS -O $HOST_DATA_DIR/data_object_label_2.zip; else echo "label archive already downloaded"; fi

os.environ["URL_LIDAR"]="KITTI_LIDAR_DOWNLOAD_URL"
!if [ ! -f $HOST_DATA_DIR/data_object_velodyne.zip ]; then wget $URL_LIDAR -O $HOST_DATA_DIR/data_object_velodyne.zip; else echo "velodyne archive already downloaded"; fi 

os.environ["URL_CALIB"]="CALIB_DOWNLOAD_URL"
!if [ ! -f $HOST_DATA_DIR/data_object_calib.zip ]; then wget $URL_CALIB -O $HOST_DATA_DIR/data_object_calib.zip; else echo "calib archive already downloaded"; fi 

# Download train/val split
!mkdir -p $HOST_DATA_DIR/ImageSets
!if [ ! -f $HOST_DATA_DIR//ImageSets/test.txt ]; then wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O $HOST_DATA_DIR//ImageSets/test.txt; else echo "test.txt archive already downloaded"; fi 
!if [ ! -f $HOST_DATA_DIR//ImageSets/train.txt ]; then wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O $HOST_DATA_DIR/ImageSets/train.txt; else echo "train.txt archive already downloaded"; fi 
!if [ ! -f $HOST_DATA_DIR//ImageSets/val.txt ]; then wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O $HOST_DATA_DIR/ImageSets/val.txt; else echo "val.txt archive already downloaded"; fi 
!if [ ! -f $HOST_DATA_DIR//ImageSets/trainval.txt ]; then wget https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O $HOST_DATA_DIR/ImageSets/trainval.txt; else echo "trainval.txt archive already downloaded"; fi 

In [None]:
# Verification of the dataset
!if [ ! -f $HOST_DATA_DIR/data_object_image_2.zip ]; then echo 'Image zip file not found, please download.'; else echo 'Found Image zip file.';fi
!if [ ! -f $HOST_DATA_DIR/data_object_label_2.zip ]; then echo 'Label zip file not found, please download.'; else echo 'Found Labels zip file.';fi
!if [ ! -f $HOST_DATA_DIR/data_object_velodyne.zip ]; then echo 'Velodyne zip file not found, please download.'; else echo 'Found Velodyne zip file.';fi
!if [ ! -f $HOST_DATA_DIR/data_object_calib.zip ]; then echo 'Calib zip file not found, please download.'; else echo 'Found Calib zip file.';fi

In [None]:
# Unpack 
!unzip -u $HOST_DATA_DIR/data_object_image_2.zip -d $HOST_DATA_DIR
!unzip -u $HOST_DATA_DIR/data_object_label_2.zip -d $HOST_DATA_DIR
!unzip -u $HOST_DATA_DIR/data_object_velodyne.zip -d $HOST_DATA_DIR
!unzip -u $HOST_DATA_DIR/data_object_calib.zip -d $HOST_DATA_DIR

### 2.2 Convert datset to required format

Converting downloaded Kitti labels into TAOBEVFusion compatible pkl format. This convert will only parse pedestrian class from public Kitti dataset and zero-pad rotations for x, z coordinates as only y-rotation is provided in Kitti. 

In [None]:
# NOTE: The following paths are set from the perspective of the TAO Docker.

# The data is saved here
%env DATA_DIR = /data
%env SPECS_DIR = /specs
%env RESULTS_DIR = /results

In [None]:
!tao model bevfusion convert -e $SPECS_DIR/convert.yaml

### 2.3 Download pre-trained model

We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [None]:
# Installing NGC CLI on the local machine.
## Download and install
%env CLI=ngccli_cat_linux.zip
!mkdir -p $LOCAL_PROJECT_DIR/ngccli

# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))

In [None]:
!ngc registry model list nvidia/tao/bevfusion:*

In [None]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tao/bevfusion:1.0 --dest $LOCAL_PROJECT_DIR/bevfusion/

In [None]:
print("Check that model is downloaded into dir.")
!ls -l $LOCAL_PROJECT_DIR/bevfusion/bevfusion_v1.0/

## 3. Provide training specification <a class="anchor" id="head-3"></a>

We provide specification files to configure the training parameters including:

* dataset: configure the dataset and augmentation methods
    * type: KittiPersonDataset
    * root_dir: root path for dataset
    * gt_box_type: label boundinb box coordinate. Options to choose lidar or camera
    * default_cam_key: default camera name. Defaults to CAM2 in Kitti
    * train_dataset: 
        * repeat_time: number of repeat for training dataset. Defaults to 2 for Kitti
        * ann_file: path to the training annotation pkl file generated via bevfusion convert
        * data_prefix: 
            * pts: directory prefix name for lidar point cloud data
            * img: directory prefix name for camera image data
        * batch_size: batch size for train dataloader
        * workers: number of workers to do train data loading
    * val_dataset:
        * ann_file: path to the validation annotation pkl file generated via bevfusion convert
        * data_prefix: 
            * pts: directory prefix name for lidar point cloud data
            * img: directory prefix name for camera image data
        * batch_size: batch size for val dataloader
        * workers: number of workers to do val data loading
* model: configure the model setting
    * type: model name. Currenlty only supporting BEVFusion
    * point_cloud_range: point cloud range for the data
    * voxel_size: voxel size
    * grid_size: grid size for detection head
* train: configure the training hyperparameter
    * num_gpus: number of gpus
    * num_nodes: number of nodes (num_nodes=1 for single node)
    * validation_interval: validation interval
    * pretrained_checkpoint: pretrained checkpoint for finetuning
    * max_epoch: number of epochs
    * optimizer:
        * type: optimizer name. Defaults to AdamW
        * lr: learning rate for the model
    * lr_scheduler: dictionary for learning rate scheduler

Please refer to the TAO documentation about BEVFusion to get all the parameters that are configurable.


## 4. Run TAO training <a class="anchor" id="head-4"></a>
* Provide the sample spec file and the output directory location for models
* Evaluation uses Kitti3D metrics. For more info, please refer to: https://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d

* Unlike the [original BEVFusion paper](https://arxiv.org/abs/2205.13542), we used three rotation angles for the labels. As a result, our detection head learns three rotation anlges represenation. Therefore, our model is not compatible with the original model code.

In [None]:
!tao model bevfusion train -e $SPECS_DIR/experiment.yaml \
                            train.pretrained_checkpoint=/workspace/tao-experiments/bevfusion/bevfusion_v1.0/tao3d_bevfusion_epoch4.pth \
                            results_dir=$RESULTS_DIR

In [None]:
print('Trained checkpoints:')
print('---------------------')
!ls -ltrh $HOST_RESULTS_DIR/train

In [None]:
# You can set NUM_EPOCH to the epoch corresponding to any saved checkpoint
%env NUM_EPOCH=2

In [None]:
# Get the name of the checkpoint corresponding to your set epoch
tmp=!ls $HOST_RESULTS_DIR/train/*.pth | grep epoch_$NUM_EPOCH
%env CHECKPOINT={tmp[0]}

print('Rename a trained model: ')
print('---------------------')
!cp $CHECKPOINT $HOST_RESULTS_DIR/train/bevfusion_model.pth
!ls -ltrh $HOST_RESULTS_DIR/train/bevfusion_model.pth

## 5. Evaluate a trained model <a class="anchor" id="head-5"></a>

In this section, we run the `evaluate` tool to evaluate the trained model and produce the AP40 metric.

We provide evaluate.yaml specification files to configure the evaluate parameters including:

* dataset: configure the dataset and augmentation methods
    * type: KittiPersonDataset
    * root_dir: root path for dataset
    * gt_box_type: label boundinb box coordinate. Options to choose lidar or camera
    * default_cam_key: default camera name. Defaults to CAM2 in Kitti
    * test_dataset:
        * ann_file: path to the test annotation pkl file generated via bevfusion convert
        * data_prefix: 
            * pts: directory prefix name for lidar point cloud data
            * img: directory prefix name for camera image data
        * batch_size: batch size for evaluate dataloader
        * workers: number of workers to do evaluate data loading
* model: configure the model setting
    * type: model name. Currenlty only supporting BEVFusion
    * point_cloud_range: point cloud range for the data
    * voxel_size: voxel size
    * grid_size: grid size for detection head
* evaluate:
  * num_gpus: number of gpus to use for evaluate

* **NOTE: We reported 3D_AP40_moderate_strict as our AP40. Please look for this metric in the evaluation results.**

In [None]:
!tao model bevfusion evaluate \
            -e $SPECS_DIR/experiment.yaml \
            evaluate.checkpoint=$RESULTS_DIR/train/bevfusion_model.pth \
            results_dir=$RESULTS_DIR/

## 6. Visualize Inferences <a class="anchor" id="head-6"></a>
In this section, we run the `inference` tool to generate inferences on the trained models and visualize the results. The `inference` tool produces annotated image outputs and json files that contain prediction information.

We provide inference.yaml specification files to configure the evaluate parameters including:
* dataset: configure the dataset and augmentation methods
    * type: KittiPersonDataset
    * root_dir: root path for dataset
    * gt_box_type: label boundinb box coordinate. Options to choose lidar or camera
    * default_cam_key: default camera name. Defaults to CAM2 in Kitti
    * test_dataset:
        * ann_file: path to the test annotation pkl file generated via bevfusion convert
        * data_prefix: 
            * pts: directory prefix name for lidar point cloud data
            * img: directory prefix name for camera image data
        * batch_size: batch size for evaluate dataloader
        * workers: number of workers to do evaluate data loading
* model: configure the model setting
    * type: model name. Currenlty only supporting BEVFusion
    * point_cloud_range: point cloud range for the data
    * voxel_size: voxel size
    * grid_size: grid size for detection head
* inference:
    * num_gpus: number of gpus to use for evaluate
    * conf_threshold: confidence scrore threshold to filter out low score predictions

For running inference with single image and point cloud file, you can run with inference_single.yaml file. Note that cam2img and lidar2cam need to be provided as shown in the yaml file.

In [None]:
!tao model bevfusion inference \
            -e $SPECS_DIR/experiment.yaml \
            inference.checkpoint=$RESULTS_DIR/train/bevfusion_model.pth \
            results_dir=$RESULTS_DIR

In [None]:
 !tao model bevfusion inference \
            -e $SPECS_DIR/inference_single.yaml \
            inference.checkpoint=$RESULTS_DIR/train/bevfusion_model.pth \
            results_dir=$RESULTS_DIR

This notebook has come to an end.