# Object Detection using TAO Grounding DINO

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png" width="1080">

## What is Grounding DINO?

[Grounding DINO](https://arxiv.org/abs/2303.05499) is a state of the art open-set object detection model based on DINO. Grounding DINO can detect arbitrary objects with human inputs such as category names or referring expressions. Compared to DINO, Grounding DINO has text encoder and cross attention modules to align the bounding boxes with given categories / phrases.

In TAO, only single type of backbone network is supported: [Swin](https://arxiv.org/abs/2103.14030). In this notebook, we use the pretrained Swin-Tiny Grounding DINO and showcase how we can finetune on [Hard Hat Worker](https://public.roboflow.com/object-detection/hard-hat-workers/) dataset for the state of the art mAP result.

### Sample prediction of Swin-Tiny + Grounding DINO model
<img align="center" src="sample.jpg" width="960">

## Learning Objectives

In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Take a pretrained model and finetune a Grounding DINO model on Hard Hat Worker dataset
* Evaluate the trained model

For inference and deployment workflow, please refer to zero-shot inference notboook.

## Table of Contents

This notebook shows an example usecase of Grounding DINO using Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables and map drives](#head-0)
1. [Installing the TAO launcher](#head-1)
2. [Prepare dataset and pre-trained model](#head-2)
3. [Run zero-shot evaluation](#head-3)
4. [Provide training specification](#head-4)
5. [Run TAO training](#head-5)
6. [Evaluate a trained model](#head-6)
7. [Visualize inferences](#head-7)
8. [Deploy](#head-8)

## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/grounding_dino/results`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.

The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher. <br>

`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json`  file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case so these directories are correctly visible to the docker container.


In [2]:
import os
# !echo ${PWD}
# !ls /home/scopescan/workspace/grounding_dino-2/tao-experiments
workspace_file = os.path.expanduser("~/tao-experiments")
print(workspace_file)
!ls workspace_file

/home/workbench/tao-experiments
ls: cannot access 'workspace_file': No such file or directory


In [3]:
import os

# Please define this local project directory that needs to be mapped to the TAO docker session.
# %env LOCAL_PROJECT_DIR=tao-experiments
# %env LOCAL_PROJECT_DIR=/project/grounding_dino/tao-experiments
# workspace = os.path.expanduser("~")
# print(workspace)
%env LOCAL_PROJECT_DIR=/home/workbench

os.environ["HOST_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data")
os.environ["HOST_RESULTS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "grounding_dino", "results")
# ----
os.environ["HOST_IMAGE_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data", "raw-data")
os.environ["HOST_ANNOTATIONS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data", "annotations")
os.environ["HOST_SPECS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data", "specs")
os.environ["HOST_CONVERT_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "convert")
# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=~/tao-samples/grounding_dino

# The sample spec files are present in the same path as the downloaded samples.
os.environ["HOST_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)

print('==============================')
!echo $LOCAL_PROJECT_DIR
!ls $LOCAL_PROJECT_DIR
print('---------')
!echo $HOST_SPECS_DIR
!ls $HOST_SPECS_DIR
print('---------')
!echo $HOST_DATA_DIR
!ls $HOST_DATA_DIR
print('---------')
!echo $HOST_RESULTS_DIR
!ls $HOST_RESULTS_DIR

env: LOCAL_PROJECT_DIR=/home/workbench
/home/workbench
data  grounding_dino  ngccli
---------
/project/grounding_dino/specs
convert.yaml	     evaluate.yaml  gen_trt_engine.yaml  train.yaml
download_hardhat.sh  export.yaml    infer.yaml
---------
/home/workbench/data
annotations  hardhat.zip  odvg	raw-data  specs
---------
/home/workbench/grounding_dino/results
evaluate


In [None]:
print(os.path.expanduser("~"))
print(local_proj_dir)

In [8]:
!mkdir -p $HOST_DATA_DIR
!mkdir -p $HOST_SPECS_DIR
!mkdir -p $HOST_RESULTS_DIR

In [9]:
# Mapping up the local directories to the TAO docker.
import json
import os
mounts_file = os.path.expanduser("~/.tao_mounts.json")
tao_configs = {
   "Mounts":[
         # Mapping the Local project directory
        {
            "source": os.environ["LOCAL_PROJECT_DIR"],
            # "source": os.path.expanduser("~"),
            "destination": "/opt/nvidia/tao-experiments"
        },
       {
           "source": os.environ["HOST_DATA_DIR"],
           "destination": "/opt/nvidia/tao-experiments/data"
       },
       {
           "source": os.environ["HOST_IMAGE_DIR"],
           "destination": "/opt/nvidia/tao-experiments/data/raw-data"
       },
       {
           "source": os.environ["HOST_ANNOTATIONS_DIR"],
           "destination": "/opt/nvidia/tao-experiments/data/annotations"
       },
       {
           "source": os.environ["HOST_SPECS_DIR"],
           "destination": "/opt/nvidia/tao-experiments/data/specs"
       },
       {
           "source": os.environ["HOST_RESULTS_DIR"],
           "destination": "/opt/nvidia/tao-experiments/results"
       },
       {
           "source": "~/.cache",
           "destination": "/.cache"
       }
   ],
   "DockerOptions": {
        "shm_size": "64G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
         },
        "user": "{}:{}".format(os.getuid(), os.getgid()),
        "network": "grounding_dino",
        "privileged": True
   }
}
# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(tao_configs, mfile, indent=4)

In [4]:
# COPY

# Mapping up the local directories to the TAO docker.
import json
import os
mounts_file = os.path.expanduser("~/.tao_mounts.json")
tao_configs = {
   "Mounts":[
         # Mapping the Local project directory
        {
            # "source": os.environ["LOCAL_PROJECT_DIR"],
            "source": os.path.expanduser("~"),
            "destination": "/opt/nvidia/tao-experiments"
        },
       {
           "source": os.environ["HOST_DATA_DIR"],
           "destination": "/opt/nvidia/tao-experiments/data"
       },
       {
           "source": os.environ["HOST_IMAGE_DIR"],
           "destination": "/opt/nvidia/tao-experiments/data/raw-data"
       },
       {
           "source": os.environ["HOST_ANNOTATIONS_DIR"],
           "destination": "/opt/nvidia/tao-experiments/data/annotations"
       },
       {
           "source": os.environ["HOST_SPECS_DIR"],
           "destination": "/opt/nvidia/tao-experiments/data/specs"
       },
       {
           "source": os.environ["HOST_RESULTS_DIR"],
           "destination": "/opt/nvidia/tao-experiments/results"
       },
       {
           "source": "~/.cache",
           "destination": "/.cache"
       }
   ],
   "DockerOptions": {
        "shm_size": "64G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
         },
        "user": "{}:{}".format(os.getuid(), os.getgid()),
        "network": "workbench",
        "privileged": True
   }
}
# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(tao_configs, mfile, indent=4)

In [None]:
!cat ~/.tao_mounts.json

## 1. Installing the TAO launcher <a class="anchor" id="head-1"></a>
The TAO launcher is a python package distributed as a python wheel listed in the `nvidia-pyindex` python index. You may install the launcher by executing the following cell.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.10. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running

```sh
export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x
```
where x >= 6 and <= 8

We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:
* python >=3.7, <=3.10.x
* docker-ce > 19.03.5
* docker-API 1.40
* nvidia-container-toolkit > 1.3.0-1
* nvidia-container-runtime > 3.4.0-1
* nvidia-docker2 > 2.5.0-1
* nvidia-driver > 455+

Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below

```sh
docker login nvcr.io
```

You will be triggered to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python >=3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the virtualenv and virtualenvwrapper packages.

In [None]:
# SKIP this step IF you have already installed the TAO launcher.
!pip3 install nvidia-pyindex
!pip3 install nvidia-tao

In [7]:
# View the versions of the TAO launcher
!tao info

Configuration of the TAO Toolkit Instance
task_group: ['model', 'dataset', 'deploy']
format_version: 3.0
toolkit_version: 5.5.0
published_date: 08/26/2024


# COPY DATA TO CONTAINER

In [1]:
! docker ps

CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS          PORTS                                NAMES
3aab2ca95355   project-nim-anywhere   "/entrypoint.sh tail…"   2 minutes ago    Up 2 minutes    3030/tcp, 7070/tcp                   project-nim-anywhere
f85b48b6b8d3   traefik:v2.10.7        "/entrypoint.sh trae…"   36 minutes ago   Up 36 minutes   80/tcp, 127.0.0.1:10000->10000/tcp   workbench-proxy


In [3]:
!docker run -d \
    -v /project/grounding_dino/tao-experiments:/opt/nvidia/tao-experiments \
    nvcr.io/nvidia/tao/tao-toolkit:5.5.0-data-services

c979b7186efb7ef6a92def8976c67d0f4cf30e7e3e032e60afc87fad5e0bdef2


In [20]:

os.environ["HOST_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data")
os.environ["HOST_RESULTS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "grounding_dino", "results")
# ----
os.environ["HOST_IMAGE_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data", "raw-data")
os.environ["HOST_ANNOTATIONS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data", "annotations")
os.environ["HOST_SPECS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data", "specs")
os.environ["HOST_CONVERT_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "convert")


# SET CONTAINER VARIABILES
%env IMAGE_DIR = /opt/nvidia/tao-experiments/images
%env RESULTS_DIR = /opt/nvidia/tao-experiments/results
%env SPECS_DIR = /opt/nvidia/tao-experiments/specs


env: IMAGE_DIR=/opt/nvidia/tao-experiments/images
env: RESULTS_DIR=/opt/nvidia/tao-experiments/results
env: SPECS_DIR=/opt/nvidia/tao-experiments/specs


In [17]:
!echo $HOST_IMAGE_DIR
!echo $HOST_SPECS_DIR

/project/grounding_dino/tao-experiments/data/raw-data
/project/grounding_dino/tao-experiments/data/specs


In [22]:
# COPY IMAGES TO CONTAINER
# !docker cp $HOST_IMAGE_DIR/ 7b858ab455e1:/opt/nvidia/tao-experiments/data/

# COPY SPEC FILES TO CONTAINER
# !docker cp $HOST_SPECS_DIR 27f246cad250:/opt/nvidia/tao-experiments/

# COPY ANNOTATIONS FILES TO CONTAINER
# !docker cp $HOST_ANNOTATIONS_DIR 27f246cad250:/opt/nvidia/tao-experiments/data/

# COPY ODVG FILES TO CONTAINER
# !docker cp $HOST_DATA_DIR/odvg 27f246cad250:/opt/nvidia/tao-experiments/data/

Successfully copied 3.91MB to 27f246cad250:/opt/nvidia/tao-experiments/data/
Successfully copied 4.07MB to 27f246cad250:/opt/nvidia/tao-experiments/data/


## 2. Prepare dataset and pre-trained model <a class="anchor" id="head-2"></a>

### 2.1 Prepare dataset

 We will be using the Hard Hat Worker dataset for the tutorial. The following script will download HardHat dataset automatically.

In [None]:
# # Create local dir
# !mkdir -p $HOST_DATA_DIR
# # Download the data
# !bash $HOST_SPECS_DIR/download_hardhat.sh $HOST_DATA_DIR

In [None]:
# # Verification
# !ls -l $HOST_DATA_DIR/raw-data/train2017

In [None]:
# # Create ODVG folder
# !mkdir -p $HOST_DATA_DIR/odvg
# !mkdir -p $HOST_DATA_DIR/odvg/annotations

# # NOTE: The following paths are set from the perspective of the TAO Docker.

# # The data is saved here
# %env DATA_DIR = /data
# %env SPECS_DIR = /specs
# %env RESULTS_DIR = /results

In [7]:
!echo $SPECS_DIR
!echo $DATA_DIR

/specs
/data


In [112]:
# Convert COCO to ODVG format required for Grounding DINO
!tao dataset annotations convert \
            -e $SPECS_DIR/convert.yaml \
            coco.ann_file=$DATA_DIR/HardHatWorkers/raw/train/annotations_without_background.json \
            results_dir=$DATA_DIR/odvg/annotations/

2024-11-01 16:17:09,867 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-11-01 16:17:09,918 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.5.0-data-services
2024-11-01 16:17:10,005 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
[2024-11-01 16:17:12,845 - TAO Toolkit - matplotlib.font_manager - INFO] generated new fontManager
There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
Traceback (most recent call last):
  File "/usr/local/bin/annotations", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_ds/annotations/entrypoint/annotations.py", line 43, in main
    launch(vars(args), unknown_args, subtasks, task="annotations")
  File "/usr/local/lib/python3.10

In [None]:
# # Convert COCO validation annotations to have categoy id ranging from 0 to 79.
# # This is required for computing validation loss during Grounding DINO training.
# !tao dataset annotations convert \
#             -e $SPECS_DIR/convert.yaml \
#             coco.ann_file=$DATA_DIR/HardHatWorkers/raw/valid/annotations_without_background.json \
#             results_dir=$DATA_DIR/odvg/annotations/ \
#             data.output_format="COCO" \
#             coco.use_all_categories=True

### 2.2 Download pre-trained model

We will use NGC CLI to get the pre-trained models. For more details, go to [ngc.nvidia.com](ngc.nvidia.com) and click the SETUP on the navigation bar.

In [None]:
# Installing NGC CLI on the local machine.
## Download and install
%env CLI=ngccli_cat_linux.zip
!mkdir -p $LOCAL_PROJECT_DIR/ngccli

# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))

In [None]:
!ngc registry model list nvidia/tao/grounding_dino:*

In [None]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tao/grounding_dino:grounding_dino_swin_tiny_commercial_trainable_v1.0 --dest $LOCAL_PROJECT_DIR/grounding_dino/

In [23]:
print("Check that model is downloaded into dir.")
!ls -l $LOCAL_PROJECT_DIR/grounding_dino/grounding_dino_vgrounding_dino_swin_tiny_commercial_trainable_v1.0/

Check that model is downloaded into dir.
total 2022184
-rwxrwxrwx 1 workbench workbench       2360 Sep 25 16:28 experiment.yaml
-rwxrwxrwx 1 workbench workbench 2070704394 Sep 25 16:28 grounding_dino_swin_tiny_commercial_trainable.pth


## 3. Run zero-shot evaluation <a class="anchor" id="head-3"></a>

Because Grounding DINO is a multi-modal object detector with text encoder, we can run zero-shot evaluation on any dataset using the class category names as input. Let's see the zero-shot mAP of our pretrained Grounding DINO

In [33]:
# Zero-shot evaluation
!tao model grounding_dino evaluate \
            -e $SPECS_DIR/evaluate.yaml \
            evaluate.checkpoint=/workspace/tao-experiments/grounding_dino/grounding_dino_vgrounding_dino_swin_tiny_commercial_trainable_v1.0/grounding_dino_swin_tiny_commercial_trainable.pth \
            results_dir=$RESULTS_DIR

2024-09-25 11:33:30,961 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-09-25 11:33:31,011 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt
2024-09-25 11:33:31,056 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
[2024-09-25 16:33:34,770 - TAO Toolkit - root - INFO] Using GPUs [0, 1] (total 2)
'evaluate.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
'evaluate.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
  _run_hyd

## 4. Provide training specification <a class="anchor" id="head-4"></a>

We provide specification files to configure the training parameters including:

* dataset: configure the dataset and augmentation methods
    * train_data_sources:
        * image_dir: the root directory for train images
        * json_file: ODVG annotation file
        * label_map: category id and category mapping
    * val_data_sources: 
        * image_dir: the root directory for validation images
        * json_file: annotation file for validation data. Required to be in COCO json format and the categoy id should be in the range of 0 ~ # of classes - 1
    * max_labels: max number of positive + negative labels seen in a single batch. Larger max_labels usually result in better accuracy with longer training time.
    * batch_size: batch size for dataloader
    * workers: number of workers to do data loading
* model: configure the model setting
    * pretrained_backbone_path: path to the pretrained backbone model. Only Swin-variants are supported
    * num_feature_levels: number of feature levels used from backbone
    * dec_layers: number of decoder layers
    * enc_layers: number of encoder layers
    * num_queries: number of queries for the model
    * num_select: number of top-k proposals to select from
    * use_dn: flag to enable denoising during training
    * dropout_ratio: drop out ratio
* train: configure the training hyperparameters
    * num_gpus: number of gpus 
    * num_nodes: number of nodes (num_nodes=1 for single node)
    * val_interval: validation interval
    * optim:
        * lr_backbone: learning rate for backbone
        * lr: learning rate for the rest of the model
        * lr_steps: learning rate decay step milestone (MultiStep)
    * num_epochs: number of epochs
    * activation_checkpoint: recompute activations in the backward to save GPU memory. Default is `True`.
    * precision: If set to fp16/bf16, the training is run on Automatic Mixed Precision (AMP)
    * distributed_strategy: Default is `ddp`. `ddp_sharded` is also supported.

Please refer to the TAO documentation about Grounding DINO to get all the parameters that are configurable.


In [34]:
!cat $HOST_SPECS_DIR/train.yaml

train:
  num_gpus: 1
  num_nodes: 1
  validation_interval: 1
  optim:
    lr_backbone: 2e-06
    lr: 2e-5
    lr_steps: [4]
    momentum: 0.9
  num_epochs: 6
  freeze: ["backbone", "bert", "transformer.encoder", "input_proj"]
  pretrained_model_path: /workspace/tao-experiments/grounding_dino/grounding_dino_vgrounding_dino_swin_tiny_commercial_trainable_v1.0/grounding_dino_swin_tiny_commercial_trainable.pth
  precision: bf16
dataset:
  train_data_sources:
    - image_dir: /data/HardHatWorkers/raw/train/
      json_file: /data/odvg/annotations/annotations_without_background_odvg.jsonl
      label_map: /data/odvg/annotations/annotations_without_background_odvg_labelmap.json
  val_data_sources:
    image_dir: /data/HardHatWorkers/raw/valid/
    json_file: /data/odvg/annotations/annotations_without_background_remapped.json
  max_labels: 80
  batch_size: 8
  workers: 8
model:
  backbone: swin_tiny_224_1k
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  n

## 5. Run TAO training <a class="anchor" id="head-5"></a>
* Provide the sample spec file and the output directory location for models
* Evaluation uses COCO metrics. For more info, please refer to: https://cocodataset.org/#detection-eval
* We only finetune the decoders and linear layers of Grounding DINO and freeze most other layers. Depending on the size of your dataset, unfreezing other parts of the network can help boost your final mAP.
* The training can be completed within several hours on a single gpu with about 20GBs of VRAM. We recommend using more powerful GPUs if your dataset is larger or you want to finetune the larger variant of Grounding DINO.

In [None]:
print("For multi-GPU, change train.num_gpus in train.yaml or via the command line based on your machine to the desired number of GPUs.")
os.environ["NUM_TRAIN_GPUS"] = "1"

!tao model grounding_dino train \
           -e $SPECS_DIR/train.yaml \
           train.num_gpus=$NUM_TRAIN_GPUS \
           results_dir=$RESULTS_DIR

In [None]:
print('Trained checkpoints:')
print('---------------------')
!ls -ltrh $HOST_RESULTS_DIR/train

In [None]:
# You can set NUM_EPOCH to the epoch corresponding to any saved checkpoint
# %env NUM_EPOCH=006

# Get the name of the checkpoint corresponding to your set epoch
# tmp=!ls $HOST_RESULTS_DIR/train/*.pth | grep epoch_$NUM_EPOCH
# %env CHECKPOINT={tmp[0]}

# Or get the latest checkpoint
os.environ["CHECKPOINT"] = os.path.join(os.getenv("HOST_RESULTS_DIR"), "train/gdino_model_latest.pth")

print('Rename a trained model: ')
print('---------------------')
!cp $CHECKPOINT $HOST_RESULTS_DIR/train/grounding_dino_model.pth
!ls -ltrh $HOST_RESULTS_DIR/train/grounding_dino_model.pth

## 6. Evaluate a trained model <a class="anchor" id="head-6"></a>

In this section, we run the `evaluate` tool to evaluate the trained model and produce the mAP metric.

We provide evaluate.yaml specification files to configure the evaluate parameters including:

* model: configure the model setting
    * this config should remain same as your trained model's configuration.
* dataset: configure the dataset and augmentation methods
    * test_data_sources:
        * image_dir: the root directory for evaluatation images    
        * json_file: Required to be in COCO json format and the categoy id should be in the range of 0 ~ # of classes - 1
    * batch_size
    * workers
* evaluate:
    * num_gpus: number of gpus
    * conf_threshold: a threshold for confidence scores

In [None]:
# Evaluate on TAO model
!tao model grounding_dino evaluate \
            -e $SPECS_DIR/evaluate.yaml \
            evaluate.checkpoint=$RESULTS_DIR/train/grounding_dino_model.pth \
            results_dir=$RESULTS_DIR

## 7. Visualize Inferences <a class="anchor" id="head-7"></a>
In this section, we run the `inference` tool to generate inferences on the trained models and visualize the results. The `inference` tool produces annotated image outputs and txt files that contain prediction information.

We provide inference.yaml specification files to configure the inference parameters including:

* model: configure the model setting
    * this config should remain same as your trained model's configuration
* dataset: configure the dataset and augmentation methods
    * infer_data_sources:
        * image_dir: the list of directories for inference images
        * captions: list of phrases to run inference on. E.g. ["person", "black cat"]
    * batch_size
    * workers
* inference
    * conf_threshold: the confidence score threshold
    * color_map: the color mapping for each phrase. The predicted bbox will be drawn with mapped color for each phrase

In [None]:
!tao model grounding_dino inference \
        -e $SPECS_DIR/infer.yaml \
        inference.checkpoint=$RESULTS_DIR/train/grounding_dino_model.pth \
        results_dir=$RESULTS_DIR/

In [None]:
# Simple grid visualizer
!pip3 install "matplotlib>=3.3.3, <4.0"
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg']

def visualize_images(output_path, num_cols=4, num_images=10):
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 

In [None]:
# Visualizing the sample images.
IMAGE_DIR = os.path.join(os.environ['HOST_RESULTS_DIR'], "inference", "images_annotated")
COLS = 2 # number of columns in the visualizer grid.
IMAGES = 4 # number of images to visualize.

visualize_images(IMAGE_DIR, num_cols=COLS, num_images=IMAGES)

## 8. Deploy <a class="anchor" id="head-7"></a>

In [None]:
# Export the model to ONNX model
!tao model grounding_dino export \
           -e $SPECS_DIR/export.yaml \
           export.checkpoint=$RESULTS_DIR/train/grounding_dino_model.pth \
           export.onnx_file=$RESULTS_DIR/export/grounding_dino_model.onnx \
           results_dir=$RESULTS_DIR/

In [None]:
# Generate TensorRT engine using tao deploy
!tao deploy grounding_dino gen_trt_engine -e $SPECS_DIR/gen_trt_engine.yaml \
                               gen_trt_engine.onnx_file=$RESULTS_DIR/export/grounding_dino_model.onnx \
                               gen_trt_engine.trt_engine=$RESULTS_DIR/gen_trt_engine/grounding_dino_model.engine \
                               results_dir=$RESULTS_DIR

In [None]:
# Evaluate with generated TensorRT engine
!tao deploy grounding_dino evaluate -e $SPECS_DIR/evaluate.yaml \
                              evaluate.trt_engine=$RESULTS_DIR/gen_trt_engine/grounding_dino_model.engine \
                              results_dir=$RESULTS_DIR/

In [None]:
# Inference with generated TensorRT engine
!tao deploy grounding_dino inference -e $SPECS_DIR/infer.yaml \
                              inference.trt_engine=$RESULTS_DIR/gen_trt_engine/grounding_dino_model.engine \
                              results_dir=$RESULTS_DIR/

In [None]:
# Visualizing the sample images.
IMAGE_DIR = os.path.join(os.environ['HOST_RESULTS_DIR'], "trt_inference", "images_annotated")
COLS = 2 # number of columns in the visualizer grid.
IMAGES = 4 # number of images to visualize.

visualize_images(IMAGE_DIR, num_cols=COLS, num_images=IMAGES)

This notebook has come to an end.