# Foundational Model Fine-tuning using TAO Classification PyT

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. 

Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

<img align="center" src="https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png" width="1080">


## What is DinoV2 ?

NV-Dinov2 is a visual foundational model trained on NVIDIA proprietary large scale dataset. DinoV2 is a self-supervised learning method that uses a combination of two SSL techniques : DINO and iBOT. These models could greatly simplify the use of images in any system by producing all purpose visual features, i.e., features that work across image distributions and tasks without finetuning. Trained on large curated datasets, our model has learnt robust fine-grained representation useful for localization and classification tasks. This model can be used as a foundation model for a variety of downstream tasks with few labeled examples.

## What is CLIP ?

CLIP (Contrastive Language-Image Pretraining) is a deep learning model developed by OpenAI. It's designed to understand images and text together in a way that allows it to perform a wide array of tasks. Unlike traditional computer vision models that are trained solely on images, or natural language models that are trained only on text, CLIP is trained on a large dataset containing both images and their associated textual descriptions.

This allows CLIP to perform tasks like image classification, text-based image retrieval, and even generate textual descriptions for images. The key innovation is the use of a contrastive learning framework, which helps the model learn to associate images and their descriptions. Refer [here](https://openai.com/research/clip) for more information. 

## Learning Objectives

In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:

* Finetune a vit_b_32 Open CLIP model on the ImageNet-1k dataset
* Evaluate the trained model.
* Run Inference on the trained model.
* Export the trained model to a .onnx file for deployment to DeepStream.

At the end of this notebook, you will have generated a trained and optimized `classification` model
which you may deploy via [Triton](https://github.com/NVIDIA-AI-IOT/tao-toolkit-triton-apps)
or [DeepStream](https://developer.nvidia.com/deepstream-sdk).

## Table of Contents

This notebook shows an example usecase of Classification using Train Adapt Optimize (TAO) Toolkit.

0. [Set up env variables and map drives](#head-0)
1. [Installing the TAO launcher](#head-1)
2. [Prepare dataset and pre-trained model](#head-2)
3. [Provide training specification](#head-3)
4. [Run TAO training](#head-4)
5. [Evaluate trained models](#head-5)
6. [Inferences](#head-6)
7. [Deploy](#head-7)


## 0. Set up env variables and map drives <a class="anchor" id="head-0"></a>

When using the purpose-built pretrained models from NGC, please make sure to set the `$KEY` environment variable to the key as mentioned in the model overview. Failing to do so, can lead to errors when trying to load them as pretrained models.

The TAO launcher uses docker containers under the hood, and **for our data and results directory to be visible to the docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the Environment Variables and amount of Shared Memory available to the TAO launcher. <br>

`IMPORTANT NOTE:` The code below creates a sample `~/.tao_mounts.json`  file. Here, we can map directories in which we save the data, specs, results and cache. You should configure it for your specific case so these directories are correctly visible to the docker container.


In [None]:
import os

# Please define this local project directory that needs to be mapped to the TAO docker session.
%env LOCAL_PROJECT_DIR=FIXME

os.environ["HOST_DATA_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()))
os.environ["HOST_RESULTS_DIR"] = os.path.join(os.getenv("LOCAL_PROJECT_DIR", os.getcwd()), "cls_pyt_fm")

# Set this path if you don't run the notebook from the samples directory.
# The sample spec files are present in the same path as the downloaded samples.
os.environ["HOST_SPECS_DIR"] = os.path.join(
    os.getenv("NOTEBOOK_ROOT", os.getcwd()),
    "specs"
)
# Point to the 'deps' folder in samples from where you are launching notebook inside classification folder.
os.environ["PROJECT_DIR"]=FIXME
# Set your encryption key, and use the same key for all commands
%env NUM_GPUS = 1

In [2]:
! mkdir -p $HOST_DATA_DIR
! mkdir -p $HOST_SPECS_DIR
! mkdir -p $HOST_RESULTS_DIR

In [28]:
# Mapping up the local directories to the TAO docker.
import json
import os
mounts_file = os.path.expanduser("~/.tao_mounts.json")
tao_configs = {
   "Mounts":[
       # Mapping the data directory
       {
           "source": os.environ["LOCAL_PROJECT_DIR"],
           "destination": "/workspace/tao-experiments"
       },
       {
           "source": os.environ["HOST_DATA_DIR"],
           "destination": "/data"
       },
       {
           "source": os.environ["HOST_SPECS_DIR"],
           "destination": "/specs"
       },
       {
           "source": os.environ["HOST_RESULTS_DIR"],
           "destination": "/results"
       },
   ],
   "DockerOptions": {
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
         }
   }
}
# Writing the mounts file.
with open(mounts_file, "w") as mfile:
    json.dump(tao_configs, mfile, indent=4)

In [None]:
!cat ~/.tao_mounts.json

## 1. Installing the TAO launcher <a class="anchor" id="head-1"></a>
The TAO launcher is a python package distributed as a python wheel listed in PyPI. You may install the launcher by executing the following cell.

Please note that TAO Toolkit recommends users to run the TAO launcher in a virtual env with python 3.6.9. You may follow the instruction in this [page](https://virtualenvwrapper.readthedocs.io/en/latest/install.html) to set up a python virtual env using the `virtualenv` and `virtualenvwrapper` packages. Once you have setup virtualenvwrapper, please set the version of python to be used in the virtual env by using the `VIRTUALENVWRAPPER_PYTHON` variable. You may do so by running

```sh
export VIRTUALENVWRAPPER_PYTHON=/path/to/bin/python3.x
```

where x >= 6 and <= 8

We recommend performing this step first and then launching the notebook from the virtual environment. In addition to installing TAO python package, please make sure of the following software requirements:
* python >=3.7, <=3.10.x
* docker-ce > 19.03.5
* docker-API 1.40
* nvidia-container-toolkit > 1.3.0-1
* nvidia-container-runtime > 3.4.0-1
* nvidia-docker2 > 2.5.0-1
* nvidia-driver > 525+

Once you have installed the pre-requisites, please log in to the docker registry nvcr.io by following the command below

```sh
docker login nvcr.io
```

You will be triggered to enter a username and password. The username is `$oauthtoken` and the password is the API key generated from `ngc.nvidia.com`. Please follow the instructions in the [NGC setup guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key) to generate your own API key.


In [5]:
# SKIP this step IF you have already installed the TAO launcher.
!pip3 install nvidia-tao

## 2. Prepare dataset and pre-trained model <a class="anchor" id="head-2"></a>

### 2.1 Prepare dataset

**Note:** This Notebook example is for 1000 classes of ImageNet. If you are using your custom dataset other than ImageNet - Please update the `dataset.data` config with `classes` field that points to a file with class names. Please refer to documentation for more details on the classes text file. Update the `num_classes` under `model.head` accordingly. For reference: Please refer to the `train_cats_dogs.yaml` in specs of `clsasification_pyt` under parent directory which gives an example of fine-tuning on 2-classes dataset. 

You need download the ImageNet2012 dataset and format it into train/ val/ test folders. The train, val folders should be unzipped and placed in $HOST_DATA_DIR/imagenet.

The Data can be Downloaded by following instructions here: 
[MMPretrain Imagenet Download Instructions](https://mmpretrain.readthedocs.io/en/latest/user_guides/dataset_prepare.html) 

Go to official [Download page](http://www.image-net.org/download-images). Find download links for ILSVRC2012 and download the following two files.

* ILSVRC2012_img_train.tar (~138GB)

    * For training untar the class folders into the `train` such that the train folder has 1000 folders corresponding to each class.

* ILSVRC2012_img_val.tar (~6.3GB)
    * For validation images: You need to move the images to respective class folders. You can use this script for the same [valprep](https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh).

You can also use this shell script to perform the above 2 steps: [extract_ILSVRC script](https://github.com/pytorch/examples/blob/main/imagenet/extract_ILSVRC.sh).
Example contents of the final train/ val folder looks like this:

**./train**

    n07693725
    ...
    n07614500

**./val**

    n07693725
    ...
    n07614500


The above steps can also be performed by the following bash commands, if you have not downloaded the ImageNet 2012 dataset. 

In [None]:
!wget -P $HOST_DATA_DIR - https://raw.githubusercontent.com/pytorch/examples/main/imagenet/extract_ILSVRC.sh
!wget -P $HOST_DATA_DIR - https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar
!wget -P $HOST_DATA_DIR - https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
!(cd $HOST_DATA_DIR ; sh extract_ILSVRC.sh)

In [None]:
# Install the following dependencies for running the dataset preparation scripts
!pip3 install Cython==0.29.36
!pip3 install -r $PROJECT_DIR/requirements-pip.txt
!pip3 install --upgrade "six>=1.17.0,<2.0"

### 2.2. Verify downloaded dataset <a class="anchor" id="head-1-1"></a>

In [None]:
!ls -l $HOST_DATA_DIR/imagenet/train

In [None]:
# Please run this cell to ensure that your data has been downloaded properly
import os
train_dir = os.path.join(os.environ["LOCAL_PROJECT_DIR"], "imagenet/train")
if len(os.listdir(train_dir)) == 1000:
    print("Successfully ImageNet Dataset Found.")
else:
    print("Dataset Not Found. Please check properly.")

## 3. Provide training specification <a class="anchor" id="head-2"></a>

We provide specification files to configure the training parameters including:

* dataset:
  * data:
    * samples_per_gpu: Number of samples in a batch
    * workers_per_gpu: workers per GPU
    * train:
      data_prefix: /data/imagenet/train
      pipeline: Augmentations Config
   * val:
     * data_prefix: /data/imagenet/val
   * test:
     * data_prefix: /data/imagenet/val

* model:
  * backbone:
    * type: "open_clip"
    * custom_args:
      * model_name: Model arch
    * freeze: true
    * pretrained: the pretrained dataset
  * head:
    * type: LinearClsHead
    * num_classes: number of classes
    * in_channels: 512
    * loss: loss config

* train:
  * train_config:
    * find_unused_parameters: True
    * optimizer: Optimizer Config
    * lr_config: Learning Rate Config 
    * optimizer_config: Optimizer Config
    * runner:
      * max_epochs: Max num of epochs to train
    * checkpoint_config:
      * interval: Intervals at which to save the checkpoint
    * logging:
      * interval: Intervals to do logging
    * evaluation:
      * interval: Interval at which to do evaluation

Please refer to the TAO documentation about Classification to get all the parameters that are configurable.

In [None]:
!cat $HOST_SPECS_DIR/train_imagenet_clip.yaml

## 4. Run TAO training <a class="anchor" id="head-3"></a>
* Provide the sample spec file and the output directory location for models
* WARNING: training will take several hours or one day to complete


**Note:** If you are using NV-Dinov2 pre-trained weights from the NVAIE, the following parameters from the spec file model config should be used:

* model:
  * backbone:
    * type: "vit_large_patch14_dinov2_swiglu"
    * freeze: true
    * pretrained: "/path/to/NV_DINOV2_518.pth"
  * head:
    * type: LinearClsHead
    * num_classes: 1000
    * in_channels: 1024
    * loss:
      * type: CrossEntropyLoss
      * loss_weight: 1.0
      * use_soft: False
    * topk: [1, 5]


**Note:** If you are using your custom dataset other than ImageNet - Please update the `dataset.data` config with `classes` field that points to a file with classes. Update the `num_classes` under `model.head` accordingly. For more details: Please refer to the `train_imagenet_clip.yaml` for reference. 

### List of Supported Backbones

| model_name      | Pre-trained dataset         | in_channels |
|-----------------|-----------------------------|-------------|
| ViT-B-32        | laion400m_e31,laion400m_e32 | 512         |
| ViT-B-16        | laion400m_e31               | 512         |
| ViT-L-14        | laion400m_e31               | 768         |
| ViT-H-14        | laion2b_s32b_b79k           | 1024        |
| ViT-g-14        | laion2b_s12b_b42k           | 1024        |
| EVA02-L-14      | merged2b_s4b_b131k          | 768         |
| EVA02-L-14-336  | merged2b_s6b_b61k           | 768         |
| EVA02-E-14      | laion2b_s4b_b115k           | 1024        |
| EVA02-E-14-plus | laion2b_s9b_b144k           | 1024        |

Please refer to license terms here [open_clip_license](https://huggingface.co/models?library=open_clip ) for all the open_clip models licensing.
Please refer to the steps here to get a comprehensive list of models supported by OpenCLIP API: [open_clip models list](https://github.com/mlfoundations/open_clip#pretrained-models). 

**Note:** If you are using the Logistic Regression head, the following parameters from the spec file model config should be used:

* model:
  * backbone:
    * freeze: true
    * pretrained: "/path/to/NV_DINOV2_518.pth"
  * head:
    * lr_head:
      * C: 0.316   # tunable
      * max_iter: 5000   # tunable
    * type: LogisticRegressionHead
    * num_classes: 1000

In [None]:
# NOTE: The following paths are set from the perspective of the TAO Docker.

# The data is saved here
%env DATA_DIR = /data
%env SPECS_DIR = /specs
%env RESULTS_DIR = /results

In [None]:
!tao model -h

**Note**: The batch size has been set to 128 in the provided spec file. This needs a minimum of 20GB system memory. You can scale the BS based on your system mem. Recommended Batch size is 512 for best performance. Following is the Batch size vs System memory scaling:
* 128 BS -> 20GB system mem
* 256 BS -> 40GB system mem 
* 512 BS -> 80GB system mem

In [None]:
# This is the suitable number of epochs for this model with pretrained weights. Please change this value as needed.
%env EPOCHS = 10
%env NUM_GPUS = 1

print("Train Classification Model")
!tao model classification_pyt train \
                  -e $SPECS_DIR/train_imagenet_clip.yaml \
                  results_dir=$RESULTS_DIR/classification_experiment_fm \
                  train.num_gpus=$NUM_GPUS \
                  train.num_epochs=$EPOCHS

In [None]:
print("To resume from a checkpoint, use the below command. Update the epoch number accordingly")
!tao model classification_pyt train \
                  -e $SPECS_DIR/train_imagenet_clip.yaml \
                  results_dir=$RESULTS_DIR/classification_experiment_fm \
                  train.num_gpus=$NUM_GPUS \
                  train.num_epochs=$EPOCHS \
                  train.resume_training_checkpoint_path=$RESULTS_DIR/classification_experiment_fm/train/classifier_model_latest.pth

In [None]:
print('PyTorch checkpoints:')
print('---------------------')
!ls -ltrh $HOST_RESULTS_DIR/classification_experiment_fm/train

In [None]:
print('Rename a model: Note that the training is not deterministic, so you may change the model name accordingly.')
print('---------------------')
# NOTE: The following command may require `sudo`. You can run the command outside the notebook.
!ls -ltrh $HOST_RESULTS_DIR/classification_experiment_fm/train/classifier_model_latest.pth

## 5. Evaluate trained models <a class="anchor" id="head-4"></a>


Evaluate ImageNet-1k Fine-tuned Classification Model

In [None]:
!tao model classification_pyt evaluate \
                    -e $SPECS_DIR/test_clip_imagenet.yaml \
                    evaluate.num_gpus=$NUM_GPUS \
                    evaluate.checkpoint=$RESULTS_DIR/classification_experiment_fm/train/classifier_model_latest.pth \
                    results_dir=$RESULTS_DIR/classification_experiment_fm

## 6. Inferences <a class="anchor" id="head-5"></a>
In this section, we run the classification inference tool to generate inferences with the trained classification models and print the results. 


In [None]:
!tao model classification_pyt inference \
                    -e $SPECS_DIR/test_clip_imagenet.yaml \
                    inference.num_gpus=$NUM_GPUS \
                    inference.checkpoint=$RESULTS_DIR/classification_experiment_fm/train/classifier_model_latest.pth \
                    results_dir=$RESULTS_DIR/classification_experiment_fm

In [None]:
# Visualize the results
!cat $HOST_RESULTS_DIR/classification_experiment_fm/inference/result.csv

Visualize the inference with images from the csv file. It contains the following columns - Image Name, class_label, class_confidence

In [None]:
# Install Deps
!pip3 install pillow
!pip3 install "matplotlib>=3.3.3, <4.0"

In [None]:
import matplotlib.pyplot as plt
from PIL import Image
import os
import csv
from math import ceil
import random

DATA_DIR = os.environ.get('HOST_DATA_DIR')
DATA_DOWNLOAD_DIR = os.environ.get('DATA_DIR')
RESULT_DIR = os.environ.get('HOST_RESULTS_DIR')
csv_path = os.path.join(RESULT_DIR, "classification_experiment_fm/inference/" 'result.csv')
results = []
with open(csv_path) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        results.append((row[0], row[1]))
random.shuffle(results)

w,h = 200,200
fig = plt.figure(figsize=(30,30))
columns = 3
rows = 1
for i in range(1, columns*rows + 1):
    ax = fig.add_subplot(rows, columns,i)
    img = Image.open(results[i][0].replace(DATA_DOWNLOAD_DIR, DATA_DIR))
    img = img.resize((w,h))
    plt.imshow(img)
    ax.set_title(results[i][1], fontsize=40)

## 7. Deploy! <a class="anchor" id="head-6"></a>

In [None]:
# Export the Classification model to ONNX model
# NOTE: Export is done on single GPU - GPU num need not be provided

!tao model classification_pyt export \
                   -e $SPECS_DIR/export_imagenet_clip.yaml \
                   export.checkpoint=$RESULTS_DIR/classification_experiment_fm/train/classifier_model_latest.pth \
                   export.onnx_file=$RESULTS_DIR/classification_experiment_fm/export/classification_model_export.onnx \
                   results_dir=$RESULTS_DIR/classification_experiment_fm/

In [None]:
# Generate a TensorRT Engine using TAO Deploy
!tao deploy classification_pyt gen_trt_engine \
                   -e $SPECS_DIR/export_imagenet_clip.yaml \
                   gen_trt_engine.onnx_file=$RESULTS_DIR/classification_experiment_fm/export/classification_model_export.onnx \
                   gen_trt_engine.trt_engine=$RESULTS_DIR/classification_experiment_fm/gen_trt_engine/classification_model_export.engine \
                   results_dir=$RESULTS_DIR/classification_experiment_fm/

In [None]:
# Run evaluation using the generated TensorRT Engine
!tao deploy classification_pyt evaluate \
                   -e $SPECS_DIR/export_imagenet_clip.yaml \
                   evaluate.trt_engine=$RESULTS_DIR/classification_experiment_fm/gen_trt_engine/classification_model_export.engine \
                   results_dir=$RESULTS_DIR/classification_experiment_fm/

In [None]:
# Run inference using the generated TensorRT Engine
!tao deploy classification_pyt inference \
                   -e $SPECS_DIR/export_imagenet_clip.yaml \
                   inference.trt_engine=$RESULTS_DIR/classification_experiment_fm/gen_trt_engine/classification_model_export.engine \
                   results_dir=$RESULTS_DIR/classification_experiment_fm/

In [None]:
# Visualize the results
!cat $HOST_RESULTS_DIR/classification_experiment_fm/trt_inference/result.csv

In [None]:
# Visualize Inference

import matplotlib.pyplot as plt
from PIL import Image
import os
import csv
from math import ceil
import random

DATA_DIR = os.environ.get('HOST_DATA_DIR')
DATA_DOWNLOAD_DIR = os.environ.get('DATA_DIR')
RESULT_DIR = os.environ.get('HOST_RESULTS_DIR')
csv_path = os.path.join(RESULT_DIR, "classification_experiment_fm/trt_inference/" 'result.csv')
results = []
with open(csv_path) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        results.append((row[0], row[1]))
random.shuffle(results)

w,h = 200,200
fig = plt.figure(figsize=(30,30))
columns = 5
rows = 1
for i in range(1, columns*rows + 1):
    ax = fig.add_subplot(rows, columns,i)
    img = Image.open(results[i][0].replace(DATA_DOWNLOAD_DIR, DATA_DIR))
    img = img.resize((w,h))
    plt.imshow(img)
    ax.set_title(results[i][1], fontsize=40)

This notebook has come to an end.