# Text to Bounding Box Auto-labeling through TAO Data Services

[Grounding DINO](https://arxiv.org/abs/2303.05499) is a state of the art open-set object detection model based on DINO. Grounding DINO can detect arbitrary objects with human inputs such as category names or referring expressions.


### Sample prediction of Swin-Base + Grounding DINO model
<img align="center" src="sample.jpg" width="960">

## Learning Objectives

In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Create an annotated object detection dataset from a directory of images using the TAO Grounding DINO model
* Extend this object detection to generate per instance segmentation masks from the bounding boxes using the Mask Auto Label model in TAO

For inference and deployment workflow, please refer to zero-shot inference notboook.

## Table of Contents

This notebook shows an example usecase of Grounding DINO using Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables and map drives](#head-0)
1. [Installing the TAO launcher](#head-1)
2. [Prepare dataset and pre-trained model](#head-2)
3. [Generate pseudo-boxes with Grounding DINO](#head-3)
4. [Visualize pseudo-boxes](#head-4)
5. [Convert ODVG dataset to COCO format](#head-5)
6. [Generate instance segmentation masks [Optional]](#head-6)

## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>

When using the purpose-built pretrained models from NGC, please make sure to set the `$KEY` environment variable to the key as mentioned in the model overview. Failing to do so, can lead to errors when trying to load them as pretrained models.

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/mal/`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.

The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher. <br>

`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json`  file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case so these directories are correctly visible to the docker container.


In [None]:
import os

# Please define this local project directory that needs to be mapped to the TAO docker session.
os.environ["LOCAL_PROJECT_DIR"] = FIXME

os.environ["HOST_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data")
os.environ["HOST_RESULTS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "text2box", "results")

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=~/tao-samples/text2box

# The sample spec files are present in the same path as the downloaded samples.
os.environ["HOST_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)

print(f"Configuration files are available at {os.environ['HOST_SPECS_DIR']}")

In [None]:
! mkdir -p $HOST_DATA_DIR
! mkdir -p $HOST_SPECS_DIR
! mkdir -p $HOST_RESULTS_DIR

In [None]:
# Mapping up the local directories to the TAO docker.
import json
import os
mounts_file = os.path.expanduser("~/.tao_mounts.json")
tao_configs = {
   "Mounts":[
         # Mapping the Local project directory
        {
            "source": os.environ["LOCAL_PROJECT_DIR"],
            "destination": "/workspace/tao-experiments"
        },
       {
           "source": os.environ["HOST_DATA_DIR"],
           "destination": "/data"
       },
       {
           "source": os.environ["HOST_SPECS_DIR"],
           "destination": "/specs"
       },
       {
           "source": os.environ["HOST_RESULTS_DIR"],
           "destination": "/results"
       },
       {
           "source": "~/.cache",
           "destination": "/.cache"
       }
   ],
   "DockerOptions": {
        "shm_size": "64G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
         },
        "user": "{}:{}".format(os.getuid(), os.getgid()),
        "network": "host"
   }
}
# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(tao_configs, mfile, indent=4)

In [None]:
!cat ~/.tao_mounts.json

## 1. Installing the TAO launcher <a class="anchor" id="head-1"></a>
The TAO launcher is a python package distributed as a python wheel listed in the `nvidia-pyindex` python index. You may install the launcher by executing the following cell.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running

```sh
export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x
```
where x >= 6 and <= 8

We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:
* python >=3.7, <=3.10.x
* docker-ce > 19.03.5
* docker-API 1.40
* nvidia-container-toolkit > 1.3.0-1
* nvidia-container-runtime > 3.4.0-1
* nvidia-docker2 > 2.5.0-1
* nvidia-driver > 455+

Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below

```sh
docker login nvcr.io
```

You will be triggered to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python >=3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the virtualenv and virtualenvwrapper packages.

In [None]:
# SKIP this step IF you have already installed the TAO launcher.
!pip3 install nvidia-pyindex
!pip3 install nvidia-tao

In [None]:
# View the versions of the TAO launcher
!tao info --verbose

## 2. Prepare dataset and pre-trained model <a class="anchor" id="head-2"></a>

### 2.1 Prepare dataset

 We will be using the COCO dataset for the tutorial. The following script will download COCO dataset automatically.

In [None]:
# Create local dir
!mkdir -p $HOST_DATA_DIR
# Download the data
!bash $HOST_SPECS_DIR/download_coco.sh $HOST_DATA_DIR

In [None]:
# Verification
!ls -l $HOST_DATA_DIR/raw-data

### 2.2 Download pre-trained model

We will download the original Grounding DINO Swin-Base model from GitHub.. For more details about the model, please refer to [https://github.com/IDEA-Research/GroundingDINO/tree/main](https://github.com/IDEA-Research/GroundingDINO/tree/main).

In [None]:
# download a checkpoint
!mkdir -p $LOCAL_PROJECT_DIR/text2box/pretrained_grounding_dino_vswin_base/
!wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth -O $LOCAL_PROJECT_DIR/text2box/pretrained_grounding_dino_vswin_base/swin_base.pth

In [None]:
print("Check that model is downloaded into dir.")
!ls -l $LOCAL_PROJECT_DIR/text2box/pretrained_grounding_dino_vswin_base/

In [None]:
# NOTE: The following paths are set from the perspective of the TAO Docker.

# The data is saved here
%env DATA_DIR = /data
%env SPECS_DIR = /specs
%env RESULTS_DIR = /results

## 3. Generate pseudo-boxes with Grounding DINO <a class="anchor" id="head-3"></a>

In [None]:
!cat $HOST_SPECS_DIR/autolabel.yaml

In [None]:
print("For multi-GPU, change `gpu_ids` in autolabel.yaml based on your machine.")
!tao dataset auto_label generate -e $SPECS_DIR/autolabel.yaml

## 4. Visualize pseudo-boxes <a class="anchor" id="head-4"></a>

In [None]:
# Simple grid visualizer
!pip3 install "matplotlib>=3.3.3, <4.0"
import matplotlib.pyplot as plt
import os
from math import ceil
valid_image_ext = ['.jpg']

def visualize_images(output_path, num_cols=4, num_images=10):
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[80,30])
    f.tight_layout()

    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img) 
        axarr[row_id, col_id].axis("off")

In [None]:
# Visualizing the sample images.
IMAGE_DIR = os.path.join(os.environ['HOST_RESULTS_DIR'], "images_annotated")
COLS = 2 # number of columns in the visualizer grid.
IMAGES = 4 # number of images to visualize.

visualize_images(IMAGE_DIR, num_cols=COLS, num_images=IMAGES)

If you find that there are some missing instances from this initial run, you may add another iteration to the `iteration_scheduler`. For closed-set detection, we usually recommend increasing the `conf_threshold` to higher value than your previous iteration. Please note that iterativing auto-labeling for closed-set problem may not always yield the best result compared to phrase grounding task.

## 5. Convert ODVG dataset to COCO format <a class="anchor" id="head-5"></a>

In [None]:
!cat $HOST_SPECS_DIR/convert.yaml

In [None]:
# Convert ODVG to COCO
!tao dataset annotations convert -e $SPECS_DIR/convert.yaml

In [None]:
!pip3 install pycocotools

In [None]:
# Check the converted COCO JSON file can be loaded through pycocotools
from pycocotools.coco import COCO

c = COCO(os.path.join(os.environ['HOST_RESULTS_DIR'], "final_annotation.json"))
# Get annotations of only person class
anns = c.loadAnns(c.getAnnIds(catIds=[1]))

TAO also supports subsequently converting the dataset format to KITTI as well. For more information on how to convert the COCO formatted annotations to KITTI, refer to the [Data Services - Annotations](https://docs.nvidia.com/tao/tao-toolkit/text/data_services/annotations.html#annotations) section in the TAO documentation

## 6. Generate instance segmentation masks [Optional] <a class="anchor" id="head-6"></a>

Now that you have generated bounding box annotations using Grounding DINO, you can use these annotations to generate instance segmentation masks using MaskAutoLabel (MAL).

### A. Download mask autolabel model.

In [None]:
# Installing NGC CLI on the local machine.
## Download and install
import os
import platform

if platform.machine() == "x86_64":
    os.environ["CLI"]="ngccli_linux.zip"
else:
    os.environ["CLI"]="ngccli_arm64.zip"


# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm -f $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", "")) 

In [None]:
# List available models
!ngc registry model list nvidia/tao/mask_auto_label:*

In [None]:
# Download the model
!ngc registry model download-version nvidia/tao/mask_auto_label:trainable_v1.0 --dest $LOCAL_PROJECT_DIR/text2box/

In [None]:
print("Check that model is downloaded into dir.")
!ls -l $LOCAL_PROJECT_DIR/text2box/mask_auto_label_vtrainable_v1.0

In [None]:
!cat $HOST_SPECS_DIR/segmentation_autolabel.yaml

In [None]:
print("For multi-GPU, change `gpus` in autolabel.yaml based on your machine.")
!tao dataset auto_label generate -e $SPECS_DIR/segmentation_autolabel.yaml

In [None]:
print("Check the pseudo label:")
!ls -l $HOST_RESULTS_DIR

In [None]:
# install deps
!pip3 install Cython>=0.29.36
!pip3 install numpy
!pip3 install pillow

In [None]:
import os
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from PIL import Image
from pycocotools.coco import COCO
%matplotlib inline

image_dir = os.path.join(os.environ["HOST_DATA_DIR"], 'raw-data/val2017')
json_path = os.path.join(os.environ["HOST_RESULTS_DIR"], 'final_instance_annotation.json')
coco_mal = COCO(annotation_file=json_path)

# Restricting to only 5 images that contain atleast one object of category_id 3
# for ease of visualization.
for i in coco_mal.getImgIds(catIds=[3])[:5]:
    img_info = coco_mal.loadImgs(i)[0]
    img_file_name = img_info["file_name"]
    print(img_file_name)
    ann_ids = coco_mal.getAnnIds(imgIds=[i], iscrowd=None)
    anns = coco_mal.loadAnns(ann_ids)
    # raw image
    im = Image.open(os.path.join(image_dir, img_file_name))
    # plots
    fig = plt.figure(figsize = (10,10))
    ax1 = fig.add_subplot(211)
    ax1.imshow(np.asarray(im))
    ax2 = fig.add_subplot(212)
    ax2.imshow(np.asarray(im), aspect='auto')
    coco_mal.showAnns(anns, draw_bbox=False)

You have now successfully used GroundingDINO and MAL to generate an open vocabulary object detection and instance segmentation dataset. You may use this dataset to train any object detection or instance segmentation network in TAO.