# Weakly superivised instance segmentation using TAO Mask Auto-labeler

[Mask Auto-labeler (MAL)](https://arxiv.org/abs/2301.03992) is a high-quality Transformer-based mask auto-labeling framework for instance segmentation using only box annotations. MAL takes
box-cropped images as inputs and conditionally generates their mask pseudo-labels.

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png" width="1080">

## Sample prediction of MAL model
<img align="center" src="https://github.com/vpraveen-nv/model_card_images/blob/main/cv/notebook/common/mal_sample.jpg?raw=true" width="960">

## Learning Objectives

In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Take a pretrained model and train a MAL model on COCO dataset
* Evaluate the trained model
* Run inference with the trained model and visualize the result

## Table of Contents

This notebook shows an example usecase of MAL using Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables and map drives](#head-0)
1. [Installing the TAO launcher](#head-1)
2. [Prepare dataset and download pretrained model](#head-2)
3. [Provide training specification](#head-3)
4. [Run TAO training](#head-4)
5. [Evaluate a trained model](#head-5)
6. [Run inference](#head-6)

## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>

When using the purpose-built pretrained models from NGC, please make sure to set the `$KEY` environment variable to the key as mentioned in the model overview. Failing to do so, can lead to errors when trying to load them as pretrained models.

The following notebook requires the user to set an env variable called the `$LOCAL_PROJECT_DIR` as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the `$LOCAL_PROJECT_DIR/data`, while the TAO experiment generated collaterals will be output to `$LOCAL_PROJECT_DIR/mal/`. More information on how to set up the dataset and the supported steps in the TAO workflow are provided in the subsequent cells.

The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher. <br>

`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json`  file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case so these directories are correctly visible to the docker container.


In [None]:
import os

# Please define this local project directory that needs to be mapped to the TAO docker session.
%env LOCAL_PROJECT_DIR=/path/to/local/tao-experiments

os.environ["HOST_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "data")
os.environ["HOST_RESULTS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "mal")

# Set this path if you don't run the notebook from the samples directory.
# %env NOTEBOOK_ROOT=~/tao-samples/mal

# The sample spec files are present in the same path as the downloaded samples.
os.environ["HOST_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)

In [None]:
! mkdir -p $HOST_DATA_DIR
! mkdir -p $HOST_SPECS_DIR
! mkdir -p $HOST_RESULTS_DIR

In [None]:
# Mapping up the local directories to the TAO docker.
import json
import os
mounts_file = os.path.expanduser("~/.tao_mounts.json")
tlt_configs = {
   "Mounts":[
         # Mapping the Local project directory
        {
            "source": os.environ["LOCAL_PROJECT_DIR"],
            "destination": "/workspace/tao-experiments"
        },
       {
           "source": os.environ["HOST_DATA_DIR"],
           "destination": "/data"
       },
       {
           "source": os.environ["HOST_SPECS_DIR"],
           "destination": "/specs"
       },
       {
           "source": os.environ["HOST_RESULTS_DIR"],
           "destination": "/results"
       }
   ],
   "DockerOptions": {
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
         },
        # "user": "{}:{}".format(os.getuid(), os.getgid()),
        "network": "host"
   }
}
# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(tlt_configs, mfile, indent=4)

In [None]:
!cat ~/.tao_mounts.json

## 1. Installing the TAO launcher <a class="anchor" id="head-1"></a>
The TAO launcher is a python package distributed as a python wheel listed in the `nvidia-pyindex` python index. You may install the launcher by executing the following cell.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running

```sh
export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x
```
where x >= 6 and <= 8

We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:
* python >=3.7, <=3.10.x
* docker-ce > 19.03.5
* docker-API 1.40
* nvidia-container-toolkit > 1.3.0-1
* nvidia-container-runtime > 3.4.0-1
* nvidia-docker2 > 2.5.0-1
* nvidia-driver > 455+

Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below

```sh
docker login nvcr.io
```

You will be triggered to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python >=3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the virtualenv and virtualenvwrapper packages.

In [None]:
# SKIP this step IF you have already installed the TAO launcher.
!pip3 install nvidia-pyindex
!pip3 install nvidia-tao

In [None]:
# View the versions of the TAO launcher
!tao info

## 2. Prepare dataset and download pretrained model <a class="anchor" id="head-2"></a>

### 2.1 Prepare dataset

 We will be using the COCO dataset for the tutorial. The following script will download COCO dataset automatically.

In [None]:
# Create local dir
!mkdir -p $HOST_DATA_DIR
# Download the data
!bash $HOST_SPECS_DIR/download_coco.sh $HOST_DATA_DIR

In [None]:
# Verification
!ls -l $HOST_DATA_DIR/raw-data

### 2.2 Download pretrained model

In [None]:
# Installing NGC CLI on the local machine.
## Download and install
import os
import platform

if platform.machine() == "x86_64":
    os.environ["CLI"]="ngccli_linux.zip"
else:
    os.environ["CLI"]="ngccli_arm64.zip"


# Remove any previously existing CLI installations
!rm -rf $LOCAL_PROJECT_DIR/ngccli/*
!wget "https://ngc.nvidia.com/downloads/$CLI" -P $LOCAL_PROJECT_DIR/ngccli
!unzip -u "$LOCAL_PROJECT_DIR/ngccli/$CLI" -d $LOCAL_PROJECT_DIR/ngccli/
!rm -f $LOCAL_PROJECT_DIR/ngccli/*.zip 
os.environ["PATH"]="{}/ngccli/ngc-cli:{}".format(os.getenv("LOCAL_PROJECT_DIR", ""), os.getenv("PATH", ""))

In [None]:
# List available pretrained models
!ngc registry model list nvidia/tao/pretrained_mask_auto_label:*

In [None]:
# Pull pretrained model from NGC
!ngc registry model download-version nvidia/tao/pretrained_mask_auto_label:vit-base --dest $LOCAL_PROJECT_DIR

In [None]:
print("Check that model is downloaded into dir.")
!ls -l $LOCAL_PROJECT_DIR/pretrained_mask_auto_label_vvit-base

## 3. Provide experiment spec file <a class="anchor" id="head-3"></a>

We provide a specification file to configure the key parameters for this demo including:

* experiment config: configure the global experiment settings
    * num_nodes: number of nodes (num_nodes=1 for single node)
    * results_dir: the directory where your checkpoints will be saved
    * checkpoint: pretrained weights (can be either a pretrained backbone model or a trained MAL model)
* dataset config: configure the training and validation datasets
    * train_img_dir: annotation file for train data. required to be in COCO json format
    * train_ann_path: the root directory for train images
    * val_img_dir: the root directory for validation images
    * val_ann_path: annotation file for validation data. required to be in COCO json format
* model config: configure the model setting
    * arch: the backbone architecture for MAL
* train_config: configure the training hyperparameters
    * lr: learning rate for training the model
    * batch_size: batch size per gpu
    * use_amp: whether to use AMP
    * max_epochs: number of epochs
    * crop_size: input bounding box size

* **Note that the sample spec is not meant to produce SOTA accuracy on COCO. To reproduce SOTA, you might want to use TAO to train an ImageNet model first and follow the original parameters for COCO.**

Please refer to the TAO documentation about MAL to get all the parameters that are configurable.


In [None]:
!cat $HOST_SPECS_DIR/spec.yaml

## 4. Run TAO training <a class="anchor" id="head-4"></a>
* WARNING: COCO training takes about 40+ hours to complete using 8 V100 gpus. As a result, **we highly recommend that you run training with multiple high-end gpus (e.g. V100, A100)**

In [None]:
# NOTE: The following paths are set from the perspective of the TAO Docker.

# The data is saved here
%env DATA_DIR=/data
%env SPECS_DIR=/specs
%env RESULTS_DIR=/results

In [None]:
print("For multi-GPU, change train.num_gpus in spec.yaml based on your machine.")
!tao model mal train -e $SPECS_DIR/spec.yaml

In [None]:
print('Model checkpoints:')
print('---------------------')
!ls -ltrh $HOST_RESULTS_DIR/train/

In [None]:
# You can set NUM_EPOCH to the epoch corresponding to any saved checkpoint
# %env NUM_EPOCH=029

# Get the name of the checkpoint corresponding to your set epoch
# tmp=!ls $HOST_RESULTS_DIR/train/*.pth | grep epoch_$NUM_EPOCH
# %env CHECKPOINT={tmp[0]}

# Or get the latest checkpoint
os.environ["CHECKPOINT"] = os.path.join(os.getenv("HOST_RESULTS_DIR"), "train/mal_model_latest.pth")

print('Rename a trained model: ')
print('---------------------')
!cp $CHECKPOINT $HOST_RESULTS_DIR/train/mal_model.ckpt
!ls -ltrh $HOST_RESULTS_DIR/train/mal_model.ckpt

## 5. Evaluate a trained model <a class="anchor" id="head-5"></a>

In this section, we run the `evaluate` tool to evaluate the trained model and produce the mIOU metric.

In `spec.yaml`, we specify a few key parameters for evaluation including:
* experiment config
    * gpu_ids: gpu indices to use
    * checkpoint: a trained MAL model
* model config: configure the model setting
    * arch: the backbone architecture for MAL
* dataset config: configure the training and validation datasets
    * val_img_dir: the root directory for validation images
    * val_ann_path: annotation file for validation data. required to be in COCO json format
* model config: configure the model setting
    * arch: the backbone architecture for MAL

In [None]:
# Evaluate on TAO model
!tao model mal evaluate -e $SPECS_DIR/spec.yaml evaluate.checkpoint=$RESULTS_DIR/train/mal_model.ckpt

## 6. Run Inference <a class="anchor" id="head-6"></a>
In this section, we run the `inference` tool to generate inferences on the trained models and visualize the results. The `inference` tool produces an output annotation json file with pseudo-mask info.

In `spec.yaml`, we specify a few key parameters for inference including:
* experiment config
    * gpu_ids: gpu indices to use
    * checkpoint: a trained MAL model
* model config: configure the model setting
    * arch: the backbone architecture for MAL
* dataset config: configure the training and validation datasets
    * val_img_dir: the root directory for validation images
    * val_ann_path: annotation file for validation data. required to be in COCO json format
* model config: configure the model setting
    * arch: the backbone architecture for MAL
* inference config: configure the data and output for inference
    * img_dir: the root directory for test images
    * ann_path: annotation file for test data. required to be in COCO json format
    * label_dump_path: the output json file with pseudo-mask info

In [None]:
!tao model mal inference -e $SPECS_DIR/spec.yaml inference.checkpoint=$RESULTS_DIR/train/mal_model.ckpt

### 6.1. Visualize the result <a class="anchor" id="head-6-1"></a>

In [None]:
# install deps
!pip3 install Cython==0.29.36
!pip3 install numpy
!pip3 install pillow
!pip3 install "matplotlib>=3.3.3, <4.0"
!pip3 install pycocotools

In [None]:
import os
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from PIL import Image
from pycocotools.coco import COCO
%matplotlib inline

img_dir = os.path.join(os.environ['HOST_DATA_DIR'], 'raw-data/val2017/')
json_path = os.path.join(os.environ['HOST_RESULTS_DIR'], 'instances_val2017_mal.json')
data = COCO(annotation_file=json_path)
cat_ids = data.getCatIds()
query_id = cat_ids[0] # pick the 1st category
# Get the image ids containing the object of the category.
img_ids = data.getImgIds(catIds=[query_id])
# Pick 1st image
img_id = img_ids[0]
img_info = data.loadImgs([img_id])[0]
img_file_name = img_info["file_name"]
print(img_file_name)
ann_ids = data.getAnnIds(imgIds=[img_id], iscrowd=None)
anns = data.loadAnns(ann_ids)
plt.clf()
im = Image.open(os.path.join(img_dir, img_file_name))
plt.axis("off")
plt.imshow(np.asarray(im))
data.showAnns(anns, draw_bbox=True)
plt.show()

This notebook has come to an end.